Summary

This is a common garden experiment where individual fish (FISH_ID) were exposed to four different TEMPERATURE treatments (i.e., 27\(^\circ\), 28.5\(^\circ\), 30\(^\circ\), and 31.5\(^\circ\)). Fish sampled in this experiment were collected from three different sets of POPULATIONS that were collected from two different REGIONS; a low- and a high-latitude region (i.e., three populations from each region, six different populations in total).

Fish were first sampled at 27C and then subsequently at higher temperatures. Once all fish were tested at the set treatment temperature, the treatment temperature was increase 1.5\(^\circ\) over three days. Fish were then given an additional five days to adjust to the temperature increase.

To determine the MAX metabolic rate fish were placed in a swim tunnel for ten minutes. The first five minutes were used to slowly increase the water flow within the swim tunnel until fish reached the point where fish would alternate between pectoral swimming and lateral body undulations (i.e., gait change). The water flow within the swim tunnel was maintained at gait change speeds for an additional five minutes. Afterwards fish were immediately placed within a designated respirometry CHAMBER with air saturation rates being monitored for 3.5-4 hours on a four minute measure, three minutes flush, and five second wait cycle.

Maximum metabolic rate was determined by extracting the steepest sixty second interval slope from the first (sometimes, but rarely, second or third) measurement period. RESTING metabolic rate was determined by extacrting the ten shallowest slopes that were recorded over the length of the experiment.

Fish were sampled in random order. The order that fish were sampled in determined the SUMP/SYSTEM fish run on as well as the chamber they were placed in (i.e., the first fish sampled went into chamber 1 on sump/system 1, the second fish into chamber 1 on sump/system 2, the third fish into chamber 2 on sump/system 1 etc.).

Immunocompetence was tested via phytohaemaglutinin (PHA) swelling assays at the same four experimental temperatures metabolic performance was tested at. To perform the assay fish were injected with 0.03 mL of PHA subcutaneously in the caudal peduncle. Thickness of injection site was measured pre-injection as well as 18-24hour post-injection. PHA produces a localized, cell-mediated response (e.g., inflammation, T-cell proliferation, etc). The change in thickness between measurement periods was used as an proxy for immunocompetence.

2-weeks after live animal testing concluded blood and tissue samples were collected from each fish. White muscle tissue samples were used to assess enzyme activation levels of 2 different enzymes including, lactate dehydrogenase (LDH; anaerobic) and citrate synthase (CS; aerobic). Blood samples were used to determine hemaetocrit ratios.

Acanthochromismap

Glossary of Terms

EXP_FISH_ID Combined FISH_ID with TEMPERATURE the fish was tested at
FISH_ID Unique alphamueric code provided to fish
POPULATION Population/Reef the fish was collected from
REGION Region (i.e. core or leading) fish was collected from
TEMPERATURE Temperature fish was tested at
MASS Mass of the fish
RESTING_DATE Date the resting metabolic rate was recorded for each fish
RESTING_CHAMBER Respirometry chamber the fish was tested in for RMR
RESTING_SYSTEM Respirometry system (i.e. dell or asus) fish was tests with
RESTING_SUMP Respirometry sump (i.e., 1 or 2) fish was tested in
RESTING_AM_PM Time of day (i.e. morning or afternoon) fish was tested
RESTING_START_TIME Time that the fish was placed inside the repirometry chamber
RESTING Resting metabolic rate (RMR)
RESTING_MgO2.hr Resting metabolic rate divded mass
MAX_DATE Date that maximum metabolic rate was recored
MAX_CHAMBER Respirometry chamber the fish was tested in for MMR
MAX_SYSTEM Respirometry system fish was test with for MMR
MAX_SUMP Respirometry sump (i.e., 1 or 2) fish was tested in for MMR
MAX_AM_PM Time of day (i.e. morning or afternoon) fish was tested for MMR
MAX_START_TIME Time that the fish was placed inside the chamber for MMR
MAX Maximum metabolic rate (MMR)
MAX_MgO2.hr Maximum metabolic rate divided by mass
FAS Factorial metabolic rate (MMR/RMR)
NAS Net metabolic rate (MMR - RMR)
MgO2.hr_Net Net metaboic rate divided by mass
Swim.performance Fish swim performance in swim tunnel (i.e., good, okay, poor)
Notes Additional experimental notes
MASS_CENTERED Mass of fish (centered)

Load packages

Lets start by loading the packages that are needed

library(tidyverse) # data manipulation
library(janitor) # data manipulation
library(plyr) # data manipulation
library(dplyr) # data manipulation
library(lubridate) # data manipulation - specifically time data
library(ggplot2) # plotting figures
library(glmmTMB) # running models
library(performance) # model validation
library(chron) # data manipulation - specifically time data
library(DHARMa) # model validation
library(MuMIn) # model validation
library(kableExtra) # creating tables
library(broom) # dependent
library(emmeans) # post-hoc analysis
library(ggeffects) # plotting models/model validation
library(vtable) # creating tables
library(modelr) # model validation
library(kableExtra) # formatting output tables
library(sjPlot) # plotting models 
library(car) # used for Anova function 
library(ggpubr) # plotting figures

Results

Metabolic rates

Resting oxygen consumption

Scenario

For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:

Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)

Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.

Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absoulte aerboic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)

Load data

resp <- import.data

Data manipulation

Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2

resp2 = resp %>% 
  dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
  separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
  mutate(FISH_ID = factor(FISH_ID), 
         POPULATION = factor(POPULATION), 
         REGION = factor(REGION), 
         TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
         RESTING_DATE = factor(RESTING_DATE), 
         RESTING_CHAMBER = factor(RESTING_CHAMBER), 
         RESTING_SYSTEM = factor(RESTING_SYSTEM), 
         RESTING_SUMP = factor(RESTING_SUMP), 
         RESTING_AM_PM = factor(RESTING_AM_PM), 
         RESTING_START_TIME = hms(RESTING_START_TIME),
         RESTING_END_TIME = hms(RESTING_ENDTIME),
         MAX_DATE = factor(MAX_DATE), 
         MAX_CHAMBER = factor(MAX_CHAMBER), 
         MAX_SYSTEM = factor(MAX_SYSTEM), 
         MAX_SUMP = factor(MAX_SUMP), 
         MAX_AM_PM = factor(MAX_AM_PM), 
         MAX_START_TIME = hms(MAX_START_TIME), 
         Swim.performance = factor(Swim.performance), 
         NAS = as.numeric(NAS), 
         FAS = as.numeric(FAS), 
         MgO2.hr_Net = as.numeric(MgO2.hr_Net), 
         RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>% 
  dplyr::rename(MASS = WEIGHT) %>% 
  mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))

Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.

resp3 <- resp2 %>% 
  subset(  
    EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
      EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
      EXP_FISH_ID !="LKES168_27" # poor data quality
  ) 

Great! That is everything for data manipulation

Exploratory data analysis

Mass v Rest
ggplot(resp3, aes(MASS, RESTING_MgO2.hr_RESPR)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  theme_classic()

Mass v REST (LATITUDE)
ggplot(resp3, aes(MASS, RESTING_MgO2.hr_RESPR, color = REGION)) + 
  geom_point() +
  theme_classic() + 
  geom_smooth(method = "lm")

TEMPERTURE v REST (LATITUDE)
ggplot(resp3, aes(TEMPERATURE, RESTING_MgO2.hr_RESPR, color = REGION)) + 
  geom_point() +
  theme_classic()

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

The first set of models tested looked at three different hypotheses including 1) that mass has a major impact of resting oxygen consumption of fish (this has been documented in the literature), 2) if variables related to time have an important impact on the resting oxygen consumption of fish.

Fixed factors (linear regression models)
model 1
#--- base model ---#
rmr.1 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp3) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9071699 1.8396670 0.4931164 0.6225248
REGIONLeading -1.0631773 2.7553334 -0.3858616 0.7000498
TEMPERATURE 0.1759916 0.0632351 2.7831307 0.0059520
MASS_CENTERED 133.8254357 9.9843605 13.4035060 0.0000000
REGIONLeading:TEMPERATURE 0.0376457 0.0946023 0.3979365 0.6911434
model 2
#--- experimental rmr equipment hypothesis ---#
rmr.2 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + RESTING_SUMP + 
                   RESTING_AM_PM + RESTING_RUNTIME_SECONDS, 
                 family=gaussian(),
                 data = resp3) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0417266 2.5593818 0.0163034 0.9870104
REGIONLeading -1.8177666 3.7681132 -0.4824077 0.6301025
TEMPERATURE 0.3212288 0.0949708 3.3823948 0.0008816
RESTING_SUMP2 -0.0016294 0.2215833 -0.0073533 0.9941411
RESTING_AM_PMPM -0.4196626 0.2161103 -1.9418904 0.0537114
RESTING_RUNTIME_SECONDS -0.0001616 0.0000513 -3.1477047 0.0019264
REGIONLeading:TEMPERATURE 0.0125324 0.1293602 0.0968801 0.9229294
model comparison table
model df AICc BIC r2
rmr.1 6 561.9094 580.8294 0.6058512
rmr.2 8 680.4894 705.5293 0.2768734

The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value. However, RESTING_RUNTIME_SECONDS was a significant variable in model 2. Let’s see what a third model looks like if we both MASS_CENTERED and RESTING_RUNTIME_SECONDS.

model 3
rmr.3 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS, 
                 family=gaussian(),
                 data = resp3)
model comparison table 2
model df AICc BIC r2
rmr.1 6 561.9094 580.8294 0.6058512
rmr.2 8 680.4894 705.5293 0.2768734
rmr.3 7 544.9015 566.8936 0.6426906

It looks like the third model is better than the previous two. Next we will test to see if the variable temperature performs best as a 1^st (linear), 2^nd (quadratic), or 3^rd (cubic) order polynomial. As the relationship between temperature and resting oxygen consumption is predicted to be non-linear.

Polynomials
polynomial models

Note that the linear model has already been created via model rmr.3 in the previous section.

rmr.3.p2 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + RESTING_RUNTIME_SECONDS, 
                 family=gaussian(),
                 data = resp3)  

rmr.3.p3 <- glm(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED + RESTING_RUNTIME_SECONDS, 
                 family=gaussian(),
                 data = resp3)

polynomial model comparisons

model df AICc BIC r2
rmr.3 7 544.9015 566.8936 0.6489236
rmr.3.p2 9 545.1143 573.1773 0.6566813
rmr.3.p3 11 548.8463 582.8799 0.6580731

From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2^nd or 3^rd order polynomial. However, the linear and quadratic model both perform well.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
rmr.3a <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + RESTING_AM_PM + (1|FISH_ID), 
                  family=gaussian(),
                  data = resp3,
                  REML = TRUE) 


rmr.3b <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + RESTING_AM_PM + (1|POPULATION/FISH_ID), 
                  family=gaussian(),
                  data = resp3,
                  REML = TRUE)

rmr.3c <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_RUNTIME_SECONDS + RESTING_AM_PM + (1|FISH_ID) + (1|POPULATION), 
                  family=gaussian(),
                  data = resp3,
                  REML = TRUE)

random factor model comparisons

model df AICc BIC r2m r2c
rmr.3a 9 559.5705 587.6335 0.6508181 0.7311944
rmr.3b 10 561.8036 592.8646 0.6508181 0.7311944
rmr.3c 10 561.8036 592.8646 0.6508181 0.7311944

Model rmr.3a appears to be the best model, however, there seems to be no difference in how the models change depending on how the random factors are arranged.

Model validation

performance
rmr.3a (linear)

The rmr.3a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2^nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to rmr.3a, and even had a higher r2 value.

rmr.3.p2a (quadratic)

First we need to update the model by adding in the missing random factor

rmr.3.p2a <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp3, 
                 REML = TRUE) 

DHARMa residuals
rmr.3a (linear)
rmr.3a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.464 0.76 0.164 0.916 0.844 0.804 0.772 0.816 0.748 0.848 0.328 0.152 0.096 0.14 0 0.308 0.236 0.304 0.372 0.028 ...
rmr.3a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.040898, p-value = 0.9132
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96366, p-value = 0.744
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.005347594
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.040898, p-value = 0.9132
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96366, p-value = 0.744
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.005347594
rmr.3.p2 (quadratic)

First we need to update the model by adding in the missing random factor

rmr.3.p2a <- glmmTMB(RESTING_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp3, 
                 REML = TRUE) 
rmr.3.p2a %>% simulateResiduals(plot=TRUE) 

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.444 0.784 0.164 0.932 0.884 0.764 0.688 0.852 0.792 0.824 0.308 0.168 0.16 0.06 0 0.408 0.212 0.272 0.272 0.036 ...
rmr.3.p2a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.04477, p-value = 0.8477
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96383, p-value = 0.76
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.005347594
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.04477, p-value = 0.8477
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96383, p-value = 0.76
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 187, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001353802 0.0294332214
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.005347594

It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:

  • The quadratic model performs just as well as the linear model based on the model validation scores (i.e., AIC, BIC, and r2)
  • The quadratic model does a better job at dealing with collinearity that appeared in the model

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 8.3165021 0.5180504 16.0534626 0.0000000
REGIONLeading 0.0172115 0.2341284 0.0735129 0.9413980
poly(TEMPERATURE, 2)1 6.5820041 1.3078841 5.0325592 0.0000005
poly(TEMPERATURE, 2)2 2.7374223 1.1855357 2.3090171 0.0209426
MASS_CENTERED 133.3083220 12.2781446 10.8573670 0.0000000
RESTING_RUNTIME_SECONDS -0.0001467 0.0000321 -4.5683146 0.0000049
REGIONLeading:poly(TEMPERATURE, 2)1 0.8398236 1.7827144 0.4710926 0.6375746
REGIONLeading:poly(TEMPERATURE, 2)2 -2.2347739 1.7668476 -1.2648368 0.2059298
Anova
Chisq Df Pr(>Chisq)
REGION 0.003263 1 0.9544475
poly(TEMPERATURE, 2) 51.567789 2 0.0000000
MASS_CENTERED 117.882419 1 0.0000000
RESTING_RUNTIME_SECONDS 20.869498 1 0.0000049
REGION:poly(TEMPERATURE, 2) 1.823912 2 0.4017376
confint
2.5 % 97.5 % Estimate
(Intercept) 7.3011420 9.3318621 8.3165021
REGIONLeading -0.4416717 0.4760947 0.0172115
poly(TEMPERATURE, 2)1 4.0185984 9.1454098 6.5820041
poly(TEMPERATURE, 2)2 0.4138150 5.0610297 2.7374223
MASS_CENTERED 109.2436009 157.3730432 133.3083220
RESTING_RUNTIME_SECONDS -0.0002096 -0.0000837 -0.0001467
REGIONLeading:poly(TEMPERATURE, 2)1 -2.6542324 4.3338797 0.8398236
REGIONLeading:poly(TEMPERATURE, 2)2 -5.6977316 1.2281839 -2.2347739
Std.Dev.(Intercept)|FISH_ID 0.3510707 0.7291319 0.5059415
r-squared
R2_conditional R2_marginal optional
0.7370331 0.6487262 FALSE

Pairwise comparisons

emtrends [latitudes]
rmr.3.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED and RESTING_TIME_SEONDS values when looking at differences between latitudinal slopes.

emmeans [latitudes]
rmr.3.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
temperature
rmr.3.p2a %>% emmeans(~ TEMPERATURE*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(temperature)
rmr.3.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(temperature)
rmr.3.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
rmr.emm <- rmr.3.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(rmr.emm, sigma = sigma(rmr.3.p2a), edf=df.residual(rmr.3.p2a))
##  contrast                                                              
##  Core TEMPERATURE29.0855614973262 - Leading TEMPERATURE29.0855614973262
##  effect.size    SE  df lower.CL upper.CL
##       -0.249 0.324 185   -0.889    0.391
## 
## sigma used for effect sizes: 0.8731 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion while resting oxygen consumption is significantly positively correlated with temperature. However, there is no significant difference in the resting oxygen consumption between the low- and high-latitude regions.

Maximum oxygen consumption

Scenario

For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:

Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)

Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.

Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absoulte aerboic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)

Load data

resp <- import.data

Data manipulation

Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2

resp2 = resp %>% 
  dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
  separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
  mutate(FISH_ID = factor(FISH_ID), 
         POPULATION = factor(POPULATION), 
         REGION = factor(REGION), 
         TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
         RESTING_DATE = factor(RESTING_DATE), 
         RESTING_CHAMBER = factor(RESTING_CHAMBER), 
         RESTING_SYSTEM = factor(RESTING_SYSTEM), 
         RESTING_SUMP = factor(RESTING_SUMP), 
         RESTING_AM_PM = factor(RESTING_AM_PM), 
         RESTING_START_TIME = hms(RESTING_START_TIME),
         RESTING_END_TIME = hms(RESTING_ENDTIME),
         MAX_DATE = factor(MAX_DATE), 
         MAX_CHAMBER = factor(MAX_CHAMBER), 
         MAX_SYSTEM = factor(MAX_SYSTEM), 
         MAX_SUMP = factor(MAX_SUMP), 
         MAX_AM_PM = factor(MAX_AM_PM), 
         MAX_START_TIME = hms(MAX_START_TIME), 
         Swim.performance = factor(Swim.performance), 
         NAS = as.numeric(NAS), 
         FAS = as.numeric(FAS), 
         MgO2.hr_Net = as.numeric(MgO2.hr_Net), 
         RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>% 
  dplyr::rename(MASS = WEIGHT) %>% 
  mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))

Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.

resp3 <- resp2 %>% 
  subset(  
    EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
      EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
      EXP_FISH_ID !="LKES168_27" # poor data quality
  ) 

So far the analysis has been the same as the protocol outlined in the aerobic physiology resting data. One additional data removal step will take place in the maximum oxygen consumption analysis for samples where fish swam poorly and therefore their maximum oxygen consumption data is thought to be unreliable. This step is done before any data analysis has taken place.

resp4 <- resp3 %>% 
  subset(
    EXP_FISH_ID !="CSUD008_27" &  # poor swim
      EXP_FISH_ID !="CSUD008_30" &  # poor swim 
      EXP_FISH_ID !="CSUD008_28.5" & # poor swim
      EXP_FISH_ID !="CSUD018_31.5" & # poor swim 
      EXP_FISH_ID !="CSUD026_30" & # max. value low 
      EXP_FISH_ID !="CSUD074_28.5" & # fas value low 
      EXP_FISH_ID !="CSUD079_30" &
      EXP_FISH_ID !="CVLA052_27" & #nas value low 
      EXP_FISH_ID !="CVLA054_28.5" & # low max value? 
      EXP_FISH_ID !="LCHA113_27" & # poor data quality 
      EXP_FISH_ID !="LCHA113_30" & # poor swim 
      EXP_FISH_ID !="LCHA127_27" # deceased during experiment
  ) 

Exploratory data analysis

Mass v Rest
ggplot(resp4, aes(MASS, MAX_MgO2.hr_RESPR)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  theme_classic()

Mass v REST (LATITUDE)
ggplot(resp4, aes(MASS, MAX_MgO2.hr_RESPR, color = REGION)) + 
  geom_point() +
  theme_classic() + 
  geom_smooth(method = "lm")

TEMPERTURE v REST (LATITUDE)
ggplot(resp4, aes(TEMPERATURE, MAX_MgO2.hr_RESPR, color = REGION)) + 
  geom_point() +
  theme_classic()

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

The first set of models tested looked at three different hypotheses including 1) that mass has a major impact of resting oxygen consumption of fish (this has been documented in the literature), 2) if variables related to time have an important impact on the resting oxygen consumption of fish.

Fixed factors (linear regression models)
model 1
#--- base model ---#
mmr.1 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4)  

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.5748341 4.4522838 -1.027525 0.3056245
REGIONLeading 17.6468408 6.5852635 2.679747 0.0080880
TEMPERATURE 0.6944978 0.1530872 4.536615 0.0000107
MASS_CENTERED 298.1596890 24.1436460 12.349406 0.0000000
REGIONLeading:TEMPERATURE -0.6609056 0.2261837 -2.921986 0.0039482
model 2
#--- experimental rmr equipment hypothesis ---#
mmr.2 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MAX_SUMP + MAX_CHAMBER + 
                   MAX_AM_PM, 
                 family=gaussian(),
                 data = resp4) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.2242856 6.4786135 -0.9607435 0.3380701
REGIONLeading 16.0665387 9.1812310 1.7499330 0.0819666
TEMPERATURE 0.7876835 0.2182937 3.6083665 0.0004069
MAX_SUMP2 -0.1739027 0.5317501 -0.3270383 0.7440484
MAX_CHAMBER2 0.9203132 0.7549758 1.2189970 0.2245644
MAX_CHAMBER3 1.1525268 0.7667736 1.5030861 0.1347055
MAX_CHAMBER4 0.5061699 0.8313336 0.6088649 0.5434413
MAX_AM_PMPM -0.4757712 0.5382845 -0.8838656 0.3780394
REGIONLeading:TEMPERATURE -0.7177722 0.3148211 -2.2799368 0.0238764
model comparison table
model df AICc BIC r2
mmr.1 6 828.4562 846.9821 0.6622066
mmr.2 10 945.6551 976.0266 0.3732903

The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value.

Polynomials
polynomial models

Note that the linear model has already been created via model rmr.3 in the previous section.

mmr.1.p2 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4)

mmr.1.p3 <- glm(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4)

polynomial model comparisons

model df AICc BIC r2
mmr.1 6 828.4562 846.9821 0.6673593
mmr.1.p2 8 832.3856 856.8872 0.6681820
mmr.1.p3 10 836.8419 867.2134 0.6682098

From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2nd or 3rd order polynomial. However, the linear and quadratic model both perform well.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
mmr.1a <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE) 

mmr.1b <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|POPULATION/FISH_ID), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE)

mmr.1c <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID) + (REGION|POPULATION), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE) # Convergence problem

random factor model comparisons

model df AICc BIC r2m r2c
mmr.1a 7 809.5847 831.1115 0.6693399 0.7836732
mmr.1b 8 811.5475 836.0491 0.6694043 0.7848247
mmr.1c 10 NA NA 0.6697666 0.7848011

Model rmr.3a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.

Model validation

performance
rmr.3a (linear)

The mmr.3a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to mmr.3a, and even had a higher r2 value.

mmr.1.p2a (quadratic)

First we need to update the model by adding in the missing random factor

mmr.1.p2a <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp4, 
                 REML = TRUE) 

DHARMa residuals
mmr.1a (linear)
mmr.1a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.212 0.328 0.456 0.744 0.812 0.852 1 0.7 0.372 0.684 0.692 0.032 0.216 0.204 0.612 0.084 0.452 0.988 0.936 0.848 ...
mmr.1a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.0715, p-value = 0.3293
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94899, p-value = 0.68
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01704545
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.0715, p-value = 0.3293
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94899, p-value = 0.68
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01704545
mmr.1.p2 (quadratic)

First we need to update the model by adding in the missing random factor

mmr.1.p2a <- glmmTMB(MAX_MgO2.hr_RESPR ~ 1+ REGION * poly(TEMPERATURE, 2) + MASS_CENTERED + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp4, 
                 REML = TRUE) 
mmr.1.p2a %>% simulateResiduals(plot=TRUE) 

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.228 0.308 0.492 0.728 0.788 0.876 1 0.728 0.356 0.664 0.724 0.04 0.188 0.192 0.644 0.092 0.42 0.98 0.936 0.836 ...
mmr.1.p2a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.076136, p-value = 0.2594
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94042, p-value = 0.608
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01704545
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.076136, p-value = 0.2594
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94042, p-value = 0.608
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 3, observations = 176, p-value = 0.1665
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.003529072 0.049003686
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01704545

It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:

  • The quadratic model performs just as well as the linear model based on the model validation scores (i.e., AIC, BIC, and r2)
  • The quadratic model does a better job at dealing with collinearity that appeared in the model

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 15.665104 0.3860064 40.5824948 0.0000000
REGIONLeading -1.540213 0.6378566 -2.4146702 0.0157495
poly(TEMPERATURE, 2)1 14.695118 2.8646793 5.1297602 0.0000003
poly(TEMPERATURE, 2)2 -2.507947 2.8404686 -0.8829342 0.3772718
MASS_CENTERED 313.864413 34.0009493 9.2310485 0.0000000
REGIONLeading:poly(TEMPERATURE, 2)1 -13.979779 4.2142722 -3.3172464 0.0009091
REGIONLeading:poly(TEMPERATURE, 2)2 1.659481 4.1665787 0.3982838 0.6904210
Anova
Chisq Df Pr(>Chisq)
REGION 5.244328 1 0.0220184
poly(TEMPERATURE, 2) 16.284260 2 0.0002910
MASS_CENTERED 85.212257 1 0.0000000
REGION:poly(TEMPERATURE, 2) 11.182851 2 0.0037297
confint
2.5 % 97.5 % Estimate
(Intercept) 14.908545 16.4216626 15.665104
REGIONLeading -2.790389 -0.2900373 -1.540213
poly(TEMPERATURE, 2)1 9.080450 20.3097864 14.695118
poly(TEMPERATURE, 2)2 -8.075163 3.0592692 -2.507947
MASS_CENTERED 247.223777 380.5050495 313.864413
REGIONLeading:poly(TEMPERATURE, 2)1 -22.239601 -5.7199575 -13.979779
REGIONLeading:poly(TEMPERATURE, 2)2 -6.506863 9.8258252 1.659481
Std.Dev.(Intercept)|FISH_ID 1.054358 2.1168357 1.493956
r-squared
R2_conditional R2_marginal optional
0.7829506 0.6683685 FALSE

Pairwise comparisons

emtrends [latitudes]
mmr.1.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED values when looking at differences between latitudinal slopes.

emmeans [latitudes]
mmr.1.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
temperature
mmr.1.p2a %>% emmeans(~ TEMPERATURE*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(temperature)
mmr.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(temperature)
mmr.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
mmr.emm <- mmr.1.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(mmr.emm, sigma = sigma(mmr.1.p2a), edf=df.residual(mmr.1.p2a))
##  contrast                                                              
##  Core TEMPERATURE29.0965909090909 - Leading TEMPERATURE29.0965909090909
##  effect.size    SE  df lower.CL upper.CL
##        0.825 0.364 174    0.106     1.54
## 
## sigma used for effect sizes: 2.056 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion while maximum oxygen consumption is significantly positively correlated with temperature and fish from low latitudes have significantly higher maximum consumption at elevated temperatures compared to fish from high latitudes.

Absolute aerobic scope

Scenario

For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. Individuals were tested at each temperature, resting oxygen consumption, maximum oxygen consumption. Absolute aerboic scope was calculated by using the following formula:

Absolute aerobic scope = (maximum oxygen consumption - resting oxygen consumption)

Individuals were first tested at 27\(^\circ\)C. Water temperature was then increased at a rate of 0.5\(^\circ\)C Day^-1 until the next temperature was reached. Fish were then provided with an additional 5 day to adjust to the new temperature before aerobic physiology was tested again.

Three traits are included within the aerobic physiology analysis, resting oxygen consumption, maximum oxygen consumption, and absoulte aerboic scope. Data for each metric was collect from respiratory experiments that had data recorded via a combination of programs including, AquaResp and PyroScience. Slopes (i.e., resting and maximum oxygen consumption values) were then calculated via the RespR [https://januarharianto.github.io/respR/articles/respR.html] package.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)

Load data
resp <- import.data

Data manipulation

Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns. Once these changes are made the data is being saved into a new dataframe called resp2

resp2 = resp %>% 
  dplyr::rename(EXP_FISH_ID = FISH_ID) %>%
  separate(EXP_FISH_ID, c("FISH_ID"), remove = FALSE) %>%
  mutate(FISH_ID = factor(FISH_ID), 
         POPULATION = factor(POPULATION), 
         REGION = factor(REGION), 
         TEMPERATURE = as.numeric(TEMPERATURE), #run with temperature as a factor
         RESTING_DATE = factor(RESTING_DATE), 
         RESTING_CHAMBER = factor(RESTING_CHAMBER), 
         RESTING_SYSTEM = factor(RESTING_SYSTEM), 
         RESTING_SUMP = factor(RESTING_SUMP), 
         RESTING_AM_PM = factor(RESTING_AM_PM), 
         RESTING_START_TIME = hms(RESTING_START_TIME),
         RESTING_END_TIME = hms(RESTING_ENDTIME),
         MAX_DATE = factor(MAX_DATE), 
         MAX_CHAMBER = factor(MAX_CHAMBER), 
         MAX_SYSTEM = factor(MAX_SYSTEM), 
         MAX_SUMP = factor(MAX_SUMP), 
         MAX_AM_PM = factor(MAX_AM_PM), 
         MAX_START_TIME = hms(MAX_START_TIME), 
         Swim.performance = factor(Swim.performance), 
         NAS = as.numeric(NAS), 
         FAS = as.numeric(FAS), 
         MgO2.hr_Net = as.numeric(MgO2.hr_Net), 
         RESTING_RUNTIME_SECONDS = as.numeric(hms(RESTING_RUNTIME))) %>% 
  dplyr::rename(MASS = WEIGHT) %>% 
  mutate(MASS_CENTERED = scale(MASS, scale = FALSE, center = TRUE))

Next select data points will be removed. Beside the removed data points I have provided reasoning for their exclusion, such as fish died during the experiment, or data quailty was poor - which likely indicated that there was an issue with the equipment during the trial.

resp3 <- resp2 %>% 
  subset(  
    EXP_FISH_ID !="LCHA127_27" & # deceased during experiment
      EXP_FISH_ID !="LCHA132_27" & # deceased during experiment
      EXP_FISH_ID !="LKES168_27" # poor data quality
  ) 

So far the analysis has been the same as the protocol outlined in the aerobic physiology resting data. One additional data removal step will take place in the maximum oxygen consumption analysis for samples where fish swam poorly and therefore their maximum oxygen consumption data is thought to be unreliable. This step is done before any data analysis has taken place.

resp4 <- resp3 %>% 
  subset(
    EXP_FISH_ID !="CSUD008_27" &  # poor swim
      EXP_FISH_ID !="CSUD008_30" &  # poor swim 
      EXP_FISH_ID !="CSUD008_28.5" & # poor swim
      EXP_FISH_ID !="CSUD018_31.5" & # poor swim 
      EXP_FISH_ID !="CSUD026_30" & # max. value low 
      EXP_FISH_ID !="CSUD074_28.5" & # fas value low 
      EXP_FISH_ID !="CSUD079_30" &
      EXP_FISH_ID !="CVLA052_27" & #nas value low 
      EXP_FISH_ID !="CVLA054_28.5" & # low max value? 
      EXP_FISH_ID !="LCHA113_27" & # poor data quality 
      EXP_FISH_ID !="LCHA113_30" & # poor swim 
      EXP_FISH_ID !="LCHA127_27" # deceased during experiment
  ) 

Exploratory data analysis

Mass v AAS
ggplot(resp4, aes(MASS, MgO2.hr_Net)) + 
  geom_point() + 
  geom_smooth(method = "lm") + 
  theme_classic()

Mass v AAS (LATITUDE)
ggplot(resp4, aes(MASS, MgO2.hr_Net, color = REGION)) + 
  geom_point() +
  theme_classic() + 
  geom_smooth(method = "lm")

TEMPERTURE v AAS (LATITUDE)
ggplot(resp4, aes(TEMPERATURE, MgO2.hr_Net, color = REGION)) + 
  geom_point() +
  theme_classic()

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

The first set of models tested looked at three different hypotheses including 1) that mass has a major impact of resting oxygen consumption of fish (this has been documented in the literature), 2) if variables related to time have an important impact on the resting oxygen consumption of fish.

Fixed factors (linear regression models)
model 1
#--- base model ---#
nas.1 <- glm(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4)  

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.6505476 4.5988210 -1.011248 0.3133266
REGIONLeading 17.8617376 6.8020030 2.625953 0.0094246
TEMPERATURE 0.4921098 0.1581257 3.112143 0.0021773
MASS_CENTERED 167.6469322 24.9382811 6.722473 0.0000000
REGIONLeading:TEMPERATURE -0.6707332 0.2336280 -2.870945 0.0046100
model 2
#--- experimental rmr equipment hypothesis ---#
nas.2 <- glm(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + RESTING_SUMP + RESTING_RUNTIME_SECONDS + 
                   RESTING_AM_PM, 
                 family=gaussian(),
                 data = resp4) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.1166901 4.6900539 -0.8777490 0.3813335
REGIONLeading 17.7878857 6.8578591 2.5937958 0.0103304
TEMPERATURE 0.4314945 0.1732534 2.4905397 0.0137265
MASS_CENTERED 167.2794598 25.2897560 6.6145146 0.0000000
RESTING_SUMP2 0.0078043 0.4108966 0.0189932 0.9848690
RESTING_RUNTIME_SECONDS 0.0000818 0.0000936 0.8735816 0.3835932
RESTING_AM_PMPM -0.1069842 0.3960028 -0.2701603 0.7873685
REGIONLeading:TEMPERATURE -0.6678804 0.2355090 -2.8359020 0.0051315
model comparison table
model df AICc BIC r2
nas.1 6 839.8550 858.3808 0.440634
nas.2 9 845.5166 872.9666 0.439167

The model that contains MASS_CENTERED seems to do better than the model that incorporates variables that are associated with the time that experiments are performed.This is demonstrated by the lower AIC and BIC scores, as well as higher r-squared value.

Polynomials
polynomial models

Note that the linear model has already been created via model rmr.3 in the previous section.

#--- second order polynomial ---# 
nas.1.p2 <- glm(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE,2) + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4) 

#--- third order polynomial ---#
nas.1.p3 <- glm(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE, 3) + MASS_CENTERED, 
                 family=gaussian(),
                 data = resp4) 

polynomial model comparisons

model df AICc BIC r2
nas.1 6 839.8550 858.3808 0.4463407
nas.1.p2 8 840.4431 864.9447 0.4580961
nas.1.p3 10 844.9023 875.2738 0.4581327

From our model comparison we can see the there is no additional benefit to the model by including temperature as a 2nd or 3rd order polynomial. However, the linear and quadratic model both perform well.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
nas.1a <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE) 

nas.1b <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|POPULATION/FISH_ID), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE)

nas.1c <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * TEMPERATURE + MASS_CENTERED + (1|FISH_ID) + (REGION|POPULATION), 
                  family=gaussian(),
                  data = resp4,
                  REML = TRUE) # convergence problem

random factor model comparisons

model df AICc BIC r2m r2c
nas.1a 7 828.7221 850.2488 0.4525497 0.5901940
nas.1b 8 830.8106 855.3122 0.4542611 0.5928177
nas.1c 10 NA NA 0.4560622 0.5944637

Model nas.3a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.

Model validation

performance
rmr.3a (linear)

The nas.1a model performs well, however, in the model validation performed by the performance model it looks like there are two variables that are highly correlated. If we expand the figure we can see that the highly correlated variables are REGION and REGION:TEMPERATURE. Perhaps this is unsurprising but lets see what happens when we run the quadratic (2nd polynomial) model to see if this helps deal with the high correlation between these two variables, as it performed very similarly to nas.1a, and even had a higher r2 value.

nas.1.p2a (quadratic)

First we need to update the model by adding in the missing random factor

nas.1.p2a <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE,2) + MASS_CENTERED + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp4, 
                 REML=TRUE) 

DHARMa residuals
nas.1a (linear)
nas.1a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.212 0.244 0.556 0.532 0.74 0.728 1 0.712 0.596 0.868 0.788 0.364 0.28 0.416 0.744 0.124 0.676 0.996 0.788 0.864 ...
nas.1a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.063636, p-value = 0.4741
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96215, p-value = 0.744
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01136364
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.063636, p-value = 0.4741
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.96215, p-value = 0.744
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01136364
nas.1.p2 (quadratic)

First we need to update the model by adding in the missing random factor

nas.1.p2a <- glmmTMB(MgO2.hr_Net ~ 1+ REGION * poly(TEMPERATURE,2) + MASS_CENTERED + (1|FISH_ID), 
                 family=gaussian(),
                 data = resp4, 
                 REML=TRUE)  
nas.1.p2a %>% simulateResiduals(plot=TRUE) 

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.268 0.2 0.648 0.472 0.664 0.8 1 0.756 0.488 0.812 0.844 0.4 0.224 0.352 0.792 0.172 0.596 0.996 0.876 0.828 ...
nas.1.p2a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.060045, p-value = 0.5497
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.95158, p-value = 0.64
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01136364
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.060045, p-value = 0.5497
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.95158, p-value = 0.64
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 176, p-value = 0.4095
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001379164 0.040444565
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01136364

It looks like the model that treats temperature as a second order polynomial does a better job at avoiding high levels of collinearity within the model. The quadratic model will be used moving forward because it:

  • The quadratic model performs just as well as the linear model based on the model validation scores (i.e., AIC, BIC, and r2)
  • The quadratic model does a better job at dealing with collinearity that appeared in the model

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 9.676637 0.3717003 26.033437 0.0000000
REGIONLeading -1.632353 0.6124654 -2.665216 0.0076939
poly(TEMPERATURE, 2)1 10.324322 3.0798654 3.352199 0.0008017
poly(TEMPERATURE, 2)2 -6.546341 3.0573845 -2.141157 0.0322613
MASS_CENTERED 179.593117 32.6469695 5.501066 0.0000000
REGIONLeading:poly(TEMPERATURE, 2)1 -14.500327 4.5325854 -3.199129 0.0013784
REGIONLeading:poly(TEMPERATURE, 2)2 4.897260 4.4894335 1.090841 0.2753427
Anova
Chisq Df Pr(>Chisq)
REGION 6.537818 1 0.0105605
poly(TEMPERATURE, 2) 6.515869 2 0.0384678
MASS_CENTERED 30.261721 1 0.0000000
REGION:poly(TEMPERATURE, 2) 11.460373 2 0.0032465
confint
2.5 % 97.5 % Estimate
(Intercept) 8.9481173 10.4051557 9.676637
REGIONLeading -2.8327627 -0.4319423 -1.632353
poly(TEMPERATURE, 2)1 4.2878974 16.3607477 10.324322
poly(TEMPERATURE, 2)2 -12.5387048 -0.5539778 -6.546341
MASS_CENTERED 115.6062324 243.5800011 179.593117
REGIONLeading:poly(TEMPERATURE, 2)1 -23.3840309 -5.6166225 -14.500327
REGIONLeading:poly(TEMPERATURE, 2)2 -3.9018683 13.6963875 4.897260
Std.Dev.(Intercept)|FISH_ID 0.8790124 1.9917277 1.323160
r-squared
R2_conditional R2_marginal optional
0.6031021 0.4622269 FALSE

Pairwise comparisons

emtrends [latitudes]
nas.1.p2a %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED values when looking at differences between latitudinal slopes.

emmeans [latitudes]
nas.1.p2a %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
temperature
nas.1.p2a %>% emmeans(~ TEMPERATURE*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(temperature)
nas.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(temperature)
nas.1.p2a %>% update(.~1+ REGION * as.factor(TEMPERATURE) + MASS_CENTERED + RESTING_RUNTIME_SECONDS + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
aas.emm <- nas.1.p2a %>% emmeans(~REGION*TEMPERATURE)
eff_size(aas.emm, sigma = sigma(nas.1.p2a), edf=df.residual(nas.1.p2a))
##  contrast                                                              
##  Core TEMPERATURE29.0965909090909 - Leading TEMPERATURE29.0965909090909
##  effect.size    SE  df lower.CL upper.CL
##        0.941 0.336 174    0.278     1.61
## 
## sigma used for effect sizes: 2.221 
## Confidence level used: 0.95

Summary figure

## Warning: Removed 4 rows containing missing values (`geom_point()`).

ggplot(rmr.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION))+ geom_jitter(data=rmr.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) + stat_smooth(method = “lm”, formula =y ~ poly(x, 2, raw=TRUE)) + geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION), alpha = 0.2, color=NA)+ scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+ scale_y_continuous(limits = c(2,12), breaks = seq(2, 12, by = 2)) + theme_classic() + ylab(expression(“RESTING METABOLIC RATE (MgO”[2]* ” hr”^{-1} * “)”)) + xlab(““)+ scale_linetype_manual(values = c(”solid”, “dashed”), labels = c(“Low-latitude”,“High-latitude”)) + scale_color_manual(values=c(“#B2182B”, “#4393C3”), labels = c(“Low-latitude”,“High-latitude”)) + scale_fill_manual(values=c(“#B2182B”, “#4393C3”), labels = c(“Low-latitude”,“High-latitude”)) + theme(legend.position = “top”, legend.text = element_text(size = 10), legend.title = element_blank(), axis.title = element_text(size =12), axis.text = element_text(size=10)) + annotate(“text”, x=30, y= 11.5, label=“P =0.51”, fontface=“italic”, size=5)

Conclusion

  • In conclusion while maximum oxygen consumption is significantly positively correlated with temperature and fish from low latitudes have significantly higher maximum consumption at elevated temperatures compared to fish from high latitudes.

Enzymes

Lactate dehydrogenase

Scenario

For initial details on the experiment performed please read the ReadMe file. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. After metabolic performance was tested blood and tissue samples were collected. White muscle tissue samples were used to look at the relationship between activity and temperature in two different enzymes, Lactate Dehydrogenase (LDH; anaerobic) and Citrate Synthase (CS: aerobic). Enzyme activity was measured over four different temperatures including 20\(^\circ\)C, 30\(^\circ\)C, 40\(^\circ\)C, and 50\(^\circ\)C. Enzyme activity was measured using a spectophotometer and wavelength absoprtion levels were recorded using the software program LabX.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.

Load data

ldh <- read_delim("./enzymes/LDH_LocalAdapt.txt", delim = "\t", 
                  escape_double = FALSE, col_types = cols(`Creation time` = col_datetime(format = "%d/%m/%Y %H:%M")), 
                  trim_ws = TRUE)
tissue.mass <- read.delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/tissue_mass.txt")

Data manipulation

Before the data can be analysed it is important to clean up the data file. I won’t explain step, that can be figured out by examining the different functions. The main steps that are occurring below are columns being broken up to make new columns (this is the separate function), or columns being combined to make a unique_sample_Id value. Time data can also be tricky to deal with in R, so there are a number of data manipulation steps being used to make sure that time is being read properly.

#--- data preparation/manipulation ---# 
ldh2 <- ldh %>%
  clean_names() %>%
  mutate(muscle_type = str_replace(muscle_type, " ", ".")) %>%
  unite("UNIQUE_SAMPLE_ID", c(fish_id,temperature,sample_index), sep="_", remove = FALSE) %>% 
  separate(creation_time, into=c('DATE','TIME'), sep = " ", remove = FALSE) %>% 
  arrange(sample_id_1, DATE, TIME) 

ldh3 <- ldh2 %>% 
  mutate(TIME = hms(ldh2$TIME)) %>% 
  mutate(TIME = chron(times=ldh2$TIME)) %>% 
  arrange(TIME) %>%
  group_by(UNIQUE_SAMPLE_ID, sample_id_1) %>% 
  mutate(TIME_DIFF = TIME - first(TIME)) %>% 
  filter(TIME != first(TIME)) %>%
  ungroup() %>% 
  mutate(TIME_DIFF_SECS = period_to_seconds(hms(TIME_DIFF))) %>% 
  mutate(MINUTES = TIME_DIFF_SECS/60) %>% 
  mutate(MINUTES = round(MINUTES, digits = 2)) %>% 
  dplyr::rename(CUVETTE = sample_id_1) %>% 
  mutate(REGION = substr(fish_id, 1, 1 ), 
         POPULATION = substr(fish_id, 2, 4), 
         SAMPLE_NO = substr(fish_id, 5, 7)) %>% 
  mutate(REGION = case_when( REGION =="L"~ "Leading", 
                             REGION == "C" ~ "Core", 
                             TRUE ~ "na"))   

Data cleaning

Next select data points will be removed. Data points have been checked using previously written app script that allows the user to look at the plotted points of samples that were run. Because we are interested in the slope, points that plateaued were removed from the analysis, as it signifies that the reaction ran out of ‘fuel’ to use. For LDH ‘fuel’ would refer to NADH, for CS ‘fuel’ would refer to Oxaloacetic acid. Samples that were removed were placed into one of five different groups, that can be found below:

  • grp1: removed last data point which was causing a plateau
  • grp2: removed last 2 data points which was causing a plateau
  • grp3: removed last 3 data points which was causing a plateau
  • grp4: removed last 4 data points which was causing a plateau
  • grp5: removed last 5 data points which was causing a plateau
grp1 <- c("CSUD008_20_1","CSUD008_20_2","CSUD008_20_3","CSUD008_20_4","CSUD008_20_5","CSUD008_20_6", 
          "CVLA047_50_1","CVLA047_50_2","CVLA047_50_3","CVLA047_50_4","CVLA047_50_5","CVLA047_50_6", 
          "CVLA046_50_1","CVLA046_50_2","CVLA046_50_3","CVLA046_50_4","CVLA046_50_5","CVLA046_50_6") 
grp2 <- c("LCKM180_30_1","LCKM180_30_2","LCKM180_30_3","LCKM180_30_4","LCKM180_30_5","LCKM180_30_6", 
          "LKES172_50_1","LKES172_50_2","CLKES172_50_3","LKES172_50_4","LKES172_50_5","LKES172_50_6", 
          "LCHA114_50_1","LCHA114_50_2","LCHA114_50_3","LCHA114_50_4","LCHA114_50_5","LCHA114_50_6", 
          "CSUD074_50_1","CSUD074_50_2","CSUD074_50_3","CSUD074_50_4","CSUD074_50_5","CSUD074_50_6")
grp3 <- c("LCKM165_50_1","LCKM165_50_2","LCKM165_50_3","LCKM165_50_4","LCKM165_50_5","LCKM165_50_6", 
          "LCKM163_50_1","LCKM163_50_2","CLCKM163_50_3","LCKM163_50_4","LCKM163_50_5","LCKM163_50_6", 
          "CTON068_50_1","CTON068_50_2","CTON068_50_3","CTON068_50_4","CTON068_50_5","CTON068_50_6", 
          "CVLA104_50_1","CVLA104_50_2","CVLA104_50_3","CVLA104_50_4","CVLA104_50_5","CVLA104_50_6") 
grp4 <- c("LCHA135_50_1","LCHA135_50_2","LCHA135_50_3","LCHA135_50_4","LCHA135_50_5","LCHA135_50_6", 
          "CTON069_50_1","CTON069_50_2","CCTON069_50_3","CTON069_50_4","CTON069_50_5","CTON069_50_6", 
          "CVLA045_50_1","CVLA045_50_2","CVLA045_50_3","CVLA045_50_4","CVLA045_50_5","CVLA045_50_6") 
grp5 <- c("CSUD014_50_1","CSUD014_50_2","CSUD014_50_3","CSUD014_50_4","CSUD014_50_5","CSUD014_50_6", 
          "CTON110_50_1","CTON110_50_2","CCTON110_50_3","CTON110_50_4","CTON110_50_5","CTON110_50_6")  

For some samples entire runs on certain cuvettes were poor. These samples were removed below, as well as samples from each grp outlined above:

ldh3.filtered <- ldh3 %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% c("LCKM154_20_1", 
                                   "LKES143_30_3", 
                                   "LKES143_20_2", 
                                   "CSUD010_40_2"))) %>% 
  group_by(UNIQUE_SAMPLE_ID) %>% 
  arrange(UNIQUE_SAMPLE_ID, TIME) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% grp1 & row_number() > (n() - 1))) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% grp2 & row_number() > (n() - 2))) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% grp3 & row_number() > (n() - 3))) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% grp4 & row_number() > (n() - 4))) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% grp5 & row_number() > (n() - 5))) %>% 
  ungroup() %>% 
  mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3)) 

Great! Now we have all the data points that we want to keep. However, the data needs to manipulated in a way that we can obtain and pull out slopes from the absorption readings, and the calculate enzyme activity based on these slopes. This will involve a number of steps.

Data calculations

Step1: Extract slopes

Step1 will produce a data frame that provides you with the slope that was obtained for cuvettes 1-3 for each sample run at each experimental temperature

LDH_activity <- ldh3.filtered %>% 
  group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>% 
  do({
    mod = lm(result ~ MINUTES, data = .)
    data.frame(Intercept = coef(mod)[1],
               Slope = coef(mod)[2], 
               r2 = summary(mod)$adj.r.squared)
  }) %>%
  ungroup() %>%
  filter(CUVETTE != ("6"))%>% 
  filter(CUVETTE != ("4"))%>% 
  filter(CUVETTE != ("5")) %>% 
  filter(Slope <= 0)
Step2: Slope means

Step2 will calculate the mean slope for cuvette 1-3.

LDH_activity_means <- LDH_activity %>% 
  group_by(UNIQUE_SAMPLE_ID) %>% 
  dplyr::mutate(Mean = mean(Slope)) %>% 
  ungroup()
Step3: Background activity level

Step3 will calculate background activity level by measuring the slope from cuvette 5 (postive control)

LDH_background <- ldh3 %>% 
  group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>% 
  do({
    mod = lm(result ~ MINUTES, data = .)
    data.frame(Intercept = coef(mod)[1],
               Slope = coef(mod)[2])
  }) %>%
  ungroup() %>%
  filter(CUVETTE == ("5")) %>% 
  dplyr::rename(Background = Slope) %>% 
  mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3)) 
Step4: Merging dataframes

Step4 will merge the data frames that you created with the mean slopes and the background slopes.

final_table <- LDH_activity %>% 
  full_join(distinct(LDH_activity_means[,c(1,6)]), by = "UNIQUE_SAMPLE_ID") %>% 
  full_join(LDH_background[,c(1,4)], by = "UNIQUE_SAMPLE_ID") 
final_table$Mean[duplicated(final_table$Mean)] <- ""
final_table$Background[duplicated(final_table$Background)] <- ""
final_table <- final_table %>% 
  mutate(Mean = as.numeric(Mean), 
         Background = as.numeric(Background), 
         Background_perc = Background/Mean) 
Step5: Enzyme activity levels

Step5 is where enzyme activity levels are calculated. See further details in manuscript (doi: xxx). Within this step background activity level is taken into account and subtracted from slopes where background activity was >5% or more of the sample slope.

ldh.data <- final_table %>% 
  select(c(UNIQUE_SAMPLE_ID, Mean, Background, Background_perc)) %>% 
  mutate(Mean = as.numeric(Mean), 
         Background = as.numeric(Background), 
         Background_perc = as.numeric(Background_perc)) %>% 
  mutate(Background2 = case_when(Background_perc <= 0.05 ~ 0, 
                                    TRUE ~ Background), 
         LDH_ABSORBANCE = Mean - Background2) %>%
  drop_na() %>% 
  inner_join(select(ldh3.filtered, c(UNIQUE_SAMPLE_ID, REGION, POPULATION, temperature, fish_id)), by ="UNIQUE_SAMPLE_ID") %>% 
  inner_join(tissue.mass, by = "fish_id") %>% 
  mutate(TISSUE_MASS_CENTERED = scale(TISSUE_MASS, center = TRUE, scale = FALSE)) %>%
  distinct(UNIQUE_SAMPLE_ID, REGION, POPULATION, .keep_all = TRUE) %>% 
  mutate(PATH_LENGTH = 1, 
         EXTINCTION_COEFFICIENT = 6.22, 
         TISSUE_CONCENTRATION = 0.005, 
         ASSAY_VOL = 2.975, 
         SAMPLE_VOL = 0.025, 
         LDH_ACTIVITY = ((LDH_ABSORBANCE/(PATH_LENGTH*EXTINCTION_COEFFICIENT*TISSUE_CONCENTRATION))*(ASSAY_VOL/SAMPLE_VOL))*-1) %>% 
  filter(LDH_ACTIVITY >=0)

By the end of this stage you should have a data frame that included a column called LDH_ACTIVITY along with necessary metadata - this data frame will be used to perform the statistical analysis.

Exploratory data analysis

LDH v TEMPERATURE [LATITUDE]
ggplot(ldh.data, aes(x =as.numeric(temperature), y= LDH_ACTIVITY, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

LDH V TEMPERATURE [DENSITY]
ggplot(ldh.data, aes(x = LDH_ACTIVITY, fill = temperature, color = temperature)) + 
  geom_density(alpha =0.5, position = "identity") 

LDH v TISSUE MASS (LATITUDE)
ggplot(ldh.data, aes(x =TISSUE_MASS_CENTERED, y= LDH_ACTIVITY, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

Fixed factors (linear regression models)
model 1
#--- base model ---#
ldh.model.1 <- glm(LDH_ACTIVITY ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED, 
                       family=gaussian(), 
                       data = ldh.data)  

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -100.7330611 12.3696296 -8.1435794 0.0000000
REGIONLeading -13.9358312 18.7015018 -0.7451718 0.4573740
temperature 6.4321810 0.3368483 19.0951868 0.0000000
TISSUE_MASS_CENTERED -1.4968191 0.5843444 -2.5615359 0.0114421
REGIONLeading:temperature 0.3442058 0.5071223 0.6787431 0.4983825
model 2
ldh.model.2 <- glm(LDH_ACTIVITY ~ 1 + REGION*temperature, 
                       family=gaussian(), 
                       data = ldh.data)

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -101.8239560 12.5955494 -8.0841218 0.0000000
REGIONLeading -11.4789921 19.0292884 -0.6032276 0.5472932
temperature 6.4245334 0.3431905 18.7200209 0.0000000
REGIONLeading:temperature 0.3602315 0.5166515 0.6972429 0.4867596
model comparison table
model df AICc BIC r2
ldh.model.1 6 1495.243 1512.719 0.8226257
ldh.model.2 5 1499.711 1514.347 0.8156748

The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.

Polynomials

polynomial models

Note that the linear model has already been created via model ldh.model.1 in the previous section.

#--- second order polynomial ---# 
ldh.model.1.p2 <- glm(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED, 
                       family=gaussian(), 
                       data = ldh.data)  

#--- third order polynomial ---# 
ldh.model.1.p3 <- glm(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED, 
                          family=gaussian(), 
                          data = ldh.data)  
polynomial model comparisons
model df AICc BIC r2
ldh.model.1 6 1495.243 1512.719 0.8265616
ldh.model.1.p2 8 1488.592 1511.656 0.8389161
ldh.model.1.p3 10 1487.815 1516.338 0.8445486

From our model comparison we can see that the model that runs temperature as a third order polynomial performs the best. Therefore, moving forward we will use the third order polynomial model.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
ldh.model.1.p3a <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED + (1|fish_id), 
                       family=gaussian(), 
                       data = ldh.data, 
                       REML = TRUE) 

ldh.model.1.p3b <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED + (1|POPULATION/fish_id), 
                  family=gaussian(), 
                  data = ldh.data, 
                  REML = TRUE) 

ldh.model.1.p3c <- glmmTMB(LDH_ACTIVITY ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED + (1|fish_id) + (1 + REGION|POPULATION), 
                       family=gaussian(), 
                       data = ldh.data, 
                       REML = TRUE) # convergence problem
random factor model comparisons
model df AICc BIC r2m r2c
ldh.model.1.p3a 11 1344.532 1375.736 0.8365953 0.8365953
ldh.model.1.p3b 12 1346.897 1380.747 0.8365966 0.8365966
ldh.model.1.p3c 14 NA NA 0.8365957 0.8365957

Model ldh.model.1.p3a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.

Model validation

performance
ldh.model.1.p3a (3rd order polynomial)

The ldh.model.1.p3a model looks like it performs well.

DHARMa residuals
ldh.model.1.p3a (3rd order polynomial))
ldh.model.1.p3a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.184 0.052 0 0.008 0.376 0.324 0.392 0.652 0.396 0.716 0.552 0.624 0.82 0.888 0.948 0.78 0.36 0.244 0.948 0.3 ...
ldh.model.1.p3a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10533, p-value = 0.07169
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.95832, p-value = 0.8
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 150, p-value = 0.3359
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001618827 0.047333019
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01333333
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10533, p-value = 0.07169
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.95832, p-value = 0.8
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 150, p-value = 0.3359
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001618827 0.047333019
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01333333

The model performs well and passes validation checks.

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 124.870515 6.3516685 19.6594823 0.0000000
REGIONLeading -2.072650 9.6125044 -0.2156202 0.8292838
poly(temperature, 3)1 882.397643 25.4584946 34.6602444 0.0000000
poly(temperature, 3)2 88.450190 25.6181936 3.4526318 0.0005551
poly(temperature, 3)3 -80.670632 25.7830163 -3.1288283 0.0017550
TISSUE_MASS_CENTERED -1.519811 0.9641605 -1.5763054 0.1149554
REGIONLeading:poly(temperature, 3)1 48.117770 38.4025660 1.2529832 0.2102118
REGIONLeading:poly(temperature, 3)2 40.901232 38.3469874 1.0666088 0.2861485
REGIONLeading:poly(temperature, 3)3 12.973651 38.2874780 0.3388484 0.7347239
Anova
Chisq Df Pr(>Chisq)
REGION 0.0456565 1 0.8308014
poly(temperature, 3) 2297.2292818 3 0.0000000
TISSUE_MASS_CENTERED 2.4847387 1 0.1149554
REGION:poly(temperature, 3) 2.8373979 3 0.4173805
confint
2.5 % 97.5 % Estimate
(Intercept) 112.421473 137.3195565 124.870515
REGIONLeading -20.912812 16.7675122 -2.072650
poly(temperature, 3)1 832.499911 932.2953755 882.397643
poly(temperature, 3)2 38.239453 138.6609266 88.450190
poly(temperature, 3)3 -131.204416 -30.1368487 -80.670632
TISSUE_MASS_CENTERED -3.409531 0.3699084 -1.519811
REGIONLeading:poly(temperature, 3)1 -27.149876 123.3854163 48.117770
REGIONLeading:poly(temperature, 3)2 -34.257482 116.0599465 40.901232
REGIONLeading:poly(temperature, 3)3 -62.068427 88.0157286 12.973651
Std.Dev.(Intercept)|fish_id 20.943652 35.4712452 27.256144
r-squared
R2_conditional R2_marginal optional
0.9464824 0.8365953 FALSE

Pairwise comparisons

emtrends [latitudes]
ldh.model.1.p3a  %>% emtrends(var = "temperature", type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.

emmeans [latitudes]
ldh.model.1.p3a  %>% emmeans(pairwise ~ temperature*REGION, type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
temperature
ldh.model.1.p3a  %>% emmeans(~ temperature*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(temperature)
ldh.model.1.p3a  %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>% 
  emmeans(~REGION*temperature, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(temperature)
ldh.model.1.p3a  %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>% 
  emmeans(~REGION*temperature, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
ldh.emm <- ldh.model.1.p3a %>% emmeans(~REGION*temperature)
eff_size(ldh.emm, sigma = sigma(ldh.model.1.p3a), edf=df.residual(ldh.model.1.p3a))
##  contrast                                                              
##  Core temperature35.0666666666667 - Leading temperature35.0666666666667
##  effect.size    SE  df lower.CL upper.CL
##        0.331 0.546 148   -0.748     1.41
## 
## sigma used for effect sizes: 19.02 
## Confidence level used: 0.95

Summary figure

## Warning: Removed 6 rows containing missing values (`geom_point()`).

Conclusion

  • In conclusion while LDH enzyme activity has a significantly positively correlated with temperature, however, there is no significant difference in the relationship between temperature and LDH activity when comparing fish from low- and high-latitudes.

Citrate synthase

Scenario

For initial details on the experiment performed please read the ReadMe file. In brief, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations. After metabolic performance was tested blood and tissue samples were collected. White muscle tissue samples were used to look at the relationship between activity and temperature in two different enzymes, Lactate Dehydrogenase (LDH; anaerobic) and Citrate Synthase (CS: aerobic). Enzyme activity was measured over four different temperatures including 20\(^\circ\)C, 30\(^\circ\)C, 40\(^\circ\)C, and 50\(^\circ\)C. Enzyme activity was measured using a spectophotometer and wavelength absoprtion levels were recorded using the software program LabX.

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.

Load data

cs <- read_delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/CS_LocalAdapt6.txt", 
                             delim = "\t", escape_double = FALSE, 
                             col_types = cols(...21 = col_skip(), 
                                              ...22 = col_skip()), trim_ws = TRUE) %>% 
  clean_names() %>% 
  mutate(creation_time = as.POSIXct(creation_time, format = "%d/%m/%Y %H:%M:%S"))
tissue.mass <- read.delim("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/enzymes/tissue_mass.txt") %>% 
  dplyr::rename(FISH_ID = fish_id)

Data manipulation

Before the data can be analysed it is important to clean up the data file. I won’t explain step, that can be figured out by examining the different functions. The main steps that are occurring below are columns being broken up to make new columns (this is the separate function), or columns being combined to make a unique_sample_Id value. Time data can also be tricky to deal with in R, so there are a number of data manipulation steps being used to make sure that time is being read properly.

#--- data preparation/manipulation ---# 
cs2 <- cs %>%
  mutate(muscle_type = str_replace(muscle_type, " ", ".")) %>%
  unite("UNIQUE_SAMPLE_ID", c(fish_id,temperature,sample_index), sep="_", remove = FALSE) %>% 
  separate(creation_time, into=c('DATE','TIME'), sep = " ", remove = FALSE) %>% 
  arrange(sample_id_1, DATE, TIME) 

cs3 <- cs2 %>% 
  mutate(DATE = as.Date(creation_time), 
         TIME = format(creation_time, "%H:%M:%S")) %>%
  mutate(TIME = hms(cs2$TIME)) %>% 
  mutate(TIME = chron(times=cs2$TIME)) %>% 
  arrange(TIME) %>%
  group_by(UNIQUE_SAMPLE_ID, sample_id_1) %>% 
  mutate(TIME_DIFF = TIME - first(TIME)) %>% 
  filter(TIME != first(TIME)) %>%
  ungroup() %>% 
  mutate(TIME_DIFF_SECS = period_to_seconds(hms(TIME_DIFF))) %>% 
  mutate(MINUTES = TIME_DIFF_SECS/60) %>% 
  mutate(MINUTES = round(MINUTES, digits = 2)) %>% 
  dplyr::rename(CUVETTE = sample_id_1) %>% 
  mutate(REGION = substr(fish_id, 1, 1 ), 
         POPULATION = substr(fish_id, 2, 4), 
         SAMPLE_NO = substr(fish_id, 5, 7)) %>% 
  mutate(REGION = case_when( REGION =="L"~ "Leading", 
                             REGION == "C" ~ "Core", 
                             TRUE ~ "na"))   

Data cleaning

Next select data points will be removed. Data points have been checked using previously written app script that allows the user to look at the plotted points of samples that were run. Because we are interested in the slope, points that plateaued were removed from the analysis, as it signifies that the reaction ran out of ‘fuel’ to use. For LDH ‘fuel’ would refer to NADH, for CS ‘fuel’ would refer to Oxaloacetic acid. Samples that were removed were placed into one of five different groups. No reactions reached plateau for CS, however there were a number of runs and/or cuvettes where data quailty was poor.

cs3.filtered <- cs3 %>% 
  dplyr::rename(TEMPERATURE = temperature, 
         FISH_ID = fish_id) %>%
  filter(!(TEMPERATURE == "50" & FISH_ID == "LCKM158")) %>% 
  filter(!(TEMPERATURE == "50" & FISH_ID == "CSUD010")) %>% 
  filter(!(TEMPERATURE == "40" & FISH_ID == "CSUD018")) %>% 
  filter(!(TEMPERATURE == "20" & FISH_ID == "CTON061")) %>% 
  filter(!(TEMPERATURE == "50" & FISH_ID == "CTON065")) %>%
  filter(!(FISH_ID == "CTON069")) %>% 
  filter(!(UNIQUE_SAMPLE_ID %in% c("CTON060_50_3", 
                                   "CTON061_30_3", 
                                   "CTON061_40_3", 
                                   "CTON061_50_2", 
                                   "CTON061_50_3")))%>% 
  mutate(UNIQUE_SAMPLE_ID = str_sub(UNIQUE_SAMPLE_ID, end = -3))

Great! Now we have all the data points that we want to keep. However, the data needs to manipulated in a way that we can obtain and pull out slopes from the absorption readings, and the calculate enzyme activity based on these slopes. This will involve a number of steps.

Data calculations

Step1: Extract slopes

Step1 will produce a data frame that provides you with the slope that was obtained for cuvettes 1-3 for each sample run at each experimental temperature

CS_activity <- cs3.filtered %>% 
  dplyr::group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>% 
  do({
    mod = lm(result ~ MINUTES, data = .)
    data.frame(Intercept = coef(mod)[1],
               Slope = coef(mod)[2], 
               r2 = summary(mod)$adj.r.squared)
  }) %>%
  dplyr::ungroup() %>%
  filter(CUVETTE != ("6"))%>% 
  filter(CUVETTE != ("4"))%>% 
  filter(CUVETTE != ("5"))
Step2: Slope means

Step2 will calculate the mean slope for cuvette 1-3.

CS_activity_means <- CS_activity %>%
  dplyr::group_by(UNIQUE_SAMPLE_ID) %>% 
  dplyr::mutate(Mean = mean(Slope))
Step3: Background activity level

Step3 will calculate background activity level by measuring the slope from cuvette 5 (postive control)

CS_background <- cs3.filtered %>% 
  group_by(UNIQUE_SAMPLE_ID, CUVETTE) %>% 
  do({
    mod = lm(result ~ MINUTES, data = .)
    data.frame(Intercept = coef(mod)[1],
               Slope = coef(mod)[2], 
               r2 = summary(mod)$adj.r.squared)
  }) %>%
  ungroup() %>%
  filter(CUVETTE == ("5")) %>% 
  dplyr::rename(Background = Slope)
Step4: Merging dataframes

Step4 will merge the data frames that you created with the mean slopes and the background slopes.

final_table <- CS_activity %>% 
  full_join(distinct(CS_activity_means[,c(1,6)]), by = "UNIQUE_SAMPLE_ID") %>% 
  full_join(CS_background[,c(1,4)], by = "UNIQUE_SAMPLE_ID") 
final_table$Mean[duplicated(final_table$Mean)] <- ""
final_table$Background[duplicated(final_table$Background)] <- ""
final_table <- final_table %>% 
  mutate(Mean = as.numeric(Mean), 
         Background = as.numeric(Background), 
         Background_perc = Background/Mean) 
Step5: Enzyme activity levels

Step5 is where enzyme activity levels are calculated. See further details in manuscript (doi: xxx). Within this step background activity level is taken into account and subtracted from slopes where background activity was >5% or more of the sample slope.

CS.data <- final_table %>% 
  select(c(UNIQUE_SAMPLE_ID, Mean, Background, Background_perc)) %>% 
  mutate(Mean = as.numeric(Mean), 
         Background = as.numeric(Background), 
         Background_perc = as.numeric(Background_perc)) %>% 
  mutate(Background2 = case_when(Background_perc <= 0.05 ~ 0, 
                                 TRUE ~ Background), 
         CS_ABSORBANCE = Mean - Background2) %>%
  inner_join(select(cs3.filtered, c(UNIQUE_SAMPLE_ID, REGION, POPULATION, TEMPERATURE, FISH_ID)), by ="UNIQUE_SAMPLE_ID") %>% 
  inner_join(tissue.mass, by = "FISH_ID") %>% 
  mutate(TISSUE_MASS_CENTERED = scale(TISSUE_MASS, center = TRUE, scale = FALSE)) %>%
  distinct(UNIQUE_SAMPLE_ID, REGION, POPULATION, .keep_all = TRUE) %>% 
  mutate(REGION = factor(REGION),
         PATH_LENGTH = 1, 
         EXTINCTION_COEFFICIENT = 13.6, 
         TISSUE_CONCENTRATION = 0.01, 
         ASSAY_VOL = 0.930, 
         SAMPLE_VOL = 0.020, 
         CS_ACTIVITY = ((CS_ABSORBANCE/(PATH_LENGTH*EXTINCTION_COEFFICIENT*TISSUE_CONCENTRATION))*(ASSAY_VOL/SAMPLE_VOL)))

By the end of this stage you should have a data frame that included a column called LDH_ACTIVITY along with necessary metadata - this data frame will be used to perform the statistical analysis.

Exploratory data analysis

LDH v TEMPERATURE [LATITUDE]
ggplot(CS.data, aes(x =as.numeric(TEMPERATURE), y= CS_ACTIVITY, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

LDH V TEMPERATURE [DENSITY]
ggplot(CS.data, aes(x = CS_ACTIVITY, fill = TEMPERATURE, color = TEMPERATURE)) + 
  geom_density(alpha =0.5, position = "identity") 

LDH v TISSUE MASS (LATITUDE)
ggplot(CS.data, aes(x =TISSUE_MASS_CENTERED, y= CS_ACTIVITY, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

Fixed factors (linear regression models)
model 1
#--- base model ---#
cs.model.1 <- glm(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED, 
                       family=gaussian(), 
                       data = CS.data)  

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.3913591 0.5554409 -2.5049635 0.0134922
REGIONLeading -0.7098815 0.8088522 -0.8776405 0.3817705
TEMPERATURE 0.1582065 0.0153212 10.3259736 0.0000000
TISSUE_MASS_CENTERED -0.0552608 0.0247762 -2.2304002 0.0274491
REGIONLeading:TEMPERATURE 0.0281522 0.0222011 1.2680542 0.2070624
model 2
cs.model.2 <- glm(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE, 
                       family=gaussian(), 
                       data = CS.data) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.4651180 0.5628678 -2.6029523 0.0103162
REGIONLeading -0.5875519 0.8192343 -0.7171963 0.4745396
TEMPERATURE 0.1594164 0.0155439 10.2558665 0.0000000
REGIONLeading:TEMPERATURE 0.0269796 0.0225316 1.1974084 0.2333262
model comparison table
model df AICc BIC r2
cs.model.1 6 481.1462 497.8718 0.6531885
cs.model.2 5 484.0239 498.0443 0.6417364

The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.

Polynomials

polynomial models

Note that the linear model has already been created via model cs.model.1 in the previous section.

##--- second order polynomial ---# 
cs.model.1.p2 <- glm(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 2) + TISSUE_MASS_CENTERED, 
                      family=gaussian(), 
                      data = CS.data)  

#--- third order polynomial ---# 
cs.model.1.p3 <- glm(CS_ACTIVITY ~ 1 + REGION*poly(TEMPERATURE, 3) + TISSUE_MASS_CENTERED, 
                      family=gaussian(), 
                      data = CS.data)  
polynomial model comparisons
model df AICc BIC r2
cs.model.1 6 481.1462 497.8718 0.6600734
cs.model.1.p2 8 484.7957 506.8264 0.6622002
cs.model.1.p3 10 489.1793 516.3691 0.6628374

From our model comparison we can see that the model that runs temperature as a linear model performs the best. Therefore, moving forward we will use the linear model.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
cs.model.1a <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID), 
                       family=gaussian(), 
                       data = CS.data, 
                       REML = TRUE) 

cs.model.1b <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|POPULATION/FISH_ID), 
                       family=gaussian(), 
                       data = CS.data,
                       REML = TRUE) 

cs.model.1c <- glmmTMB(CS_ACTIVITY ~ 1 + REGION*TEMPERATURE + TISSUE_MASS_CENTERED + (1|FISH_ID) + (1 + REGION|POPULATION), 
                       family=gaussian(), 
                       data = CS.data,
                       REML = TRUE)
random factor model comparisons
model df AICc BIC r2m r2c
cs.model.1a 7 415.8190 435.2150 0.6471885 0.6471885
cs.model.1b 8 416.4204 438.4512 0.6304289 0.6304289
cs.model.1c 10 420.9272 448.1170 0.6305743 0.6305743

Model cs.model.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.

Model validation

performance
rmr.3a (linear)

DHARMa residuals
nas.1a (linear)
cs.model.1a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.632 0.612 0.56 0.52 0.22 0.188 0.092 0.088 0.276 0.352 0.576 0.684 0.504 0.36 0.056 0.552 0.736 0.592 0.24 0.18 ...
cs.model.1a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10788, p-value = 0.08839
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97124, p-value = 0.92
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10788, p-value = 0.08839
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97124, p-value = 0.92
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537

The cs.model.1a model looks okay….lets play around with a link transformation to see if we can get any improvement

Model re-validation

performance
Gaussian (identity)

Gaussian (log)

Gaussian (inverse)

DHARMa
cs.model.1a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.632 0.612 0.56 0.52 0.22 0.188 0.092 0.088 0.276 0.352 0.576 0.684 0.504 0.36 0.056 0.552 0.736 0.592 0.24 0.18 ...
cs.model.1a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10788, p-value = 0.08839
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97124, p-value = 0.92
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10788, p-value = 0.08839
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97124, p-value = 0.92
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537
cs.model.1a.log %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.6 0.744 0.68 0.516 0.012 0.136 0.124 0.132 0.056 0.4 0.704 0.636 0.576 0.464 0.076 0.28 0.856 0.588 0.016 0.112 ...
cs.model.1a.log %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.085672, p-value = 0.279
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0783, p-value = 0.68
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 134, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000188921 0.040877038
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.007462687
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.085672, p-value = 0.279
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0783, p-value = 0.68
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 134, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.000188921 0.040877038
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.007462687
cs.model.1a.inv %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.38 0.532 0.648 0.712 0.188 0.372 0.436 0.632 0.248 0.416 0.64 0.344 0.46 0.596 0.572 0.336 0.592 0.684 0.236 0.296 ...
cs.model.1a.inv %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.21707, p-value = 6.554e-06
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0001e-05, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 134, p-value = 0.6316
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02715348
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.21707, p-value = 6.554e-06
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0001e-05, p-value < 2.2e-16
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 134, p-value = 0.6316
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02715348
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

From looking at the different models it looks like the model with the log-link function performs the best. In the DHARMa validation test we can see that one of quantile deviations is violated. Because the model passes all the other data validations realtively well we could move on with the log-link model. However, previously we showed that the 2nd and 3rd order polynomials also performed quite well, and we know the LDH model was not linear. So before we choose out final model, lets see what the 2nd and 3rd order polynomials look like with a log-link.

Model re-re-validation

performance
Gaussian (quadratic-log)

DHARMa
Gaussian (quadratic-log)
cs.model.1a.log.p2 %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.788 0.72 0.596 0.572 0.02 0.092 0.052 0.144 0.104 0.364 0.656 0.808 0.564 0.388 0.088 0.404 0.852 0.664 0.032 0.092 ...
cs.model.1a.log.p2 %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.067522, p-value = 0.5743
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.2149, p-value = 0.336
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.067522, p-value = 0.5743
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.2149, p-value = 0.336
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 134, p-value = 0.2892
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001812671 0.052874653
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01492537

Validations look great! Moving ahead with the quadratic log-link model.

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 1.2550450 0.0617039 20.3398046 0.0000000
REGIONLeading 0.0687295 0.0915266 0.7509231 0.4526989
poly(TEMPERATURE, 2)1 5.3479095 0.2386576 22.4082958 0.0000000
poly(TEMPERATURE, 2)2 -0.8916249 0.1859382 -4.7952749 0.0000016
TISSUE_MASS_CENTERED -0.0081661 0.0088766 -0.9199515 0.3575981
REGIONLeading:poly(TEMPERATURE, 2)1 0.5401241 0.3509571 1.5390032 0.1238035
REGIONLeading:poly(TEMPERATURE, 2)2 0.1504879 0.2680341 0.5614505 0.5744905
Anova
Chisq Df Pr(>Chisq)
REGION 1.3568748 1 0.2440798
poly(TEMPERATURE, 2) 1364.8597249 2 0.0000000
TISSUE_MASS_CENTERED 0.8463107 1 0.3575981
REGION:poly(TEMPERATURE, 2) 6.3016113 2 0.0428176
confint
2.5 % 97.5 % Estimate
(Intercept) 1.1341076 1.3759824 1.2550450
REGIONLeading -0.1106594 0.2481183 0.0687295
poly(TEMPERATURE, 2)1 4.8801492 5.8156697 5.3479095
poly(TEMPERATURE, 2)2 -1.2560572 -0.5271927 -0.8916249
TISSUE_MASS_CENTERED -0.0255639 0.0092318 -0.0081661
REGIONLeading:poly(TEMPERATURE, 2)1 -0.1477392 1.2279874 0.5401241
REGIONLeading:poly(TEMPERATURE, 2)2 -0.3748493 0.6758250 0.1504879
Std.Dev.(Intercept)|FISH_ID 0.1981099 0.3245578 0.2535708
r-squared
R2_conditional R2_marginal optional
0.5624371 0.4461464 FALSE

Pairwise comparisons

emtrends [latitudes]
cs.model.1a.log.p2  %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.

emmeans [latitudes]
cs.model.1a.log.p2  %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
TEMPERATURE
cs.model.1a.log.p2  %>% emmeans(~ TEMPERATURE*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(TEMPERATURE)
cs.model.1a.log.p2  %>% update(.~1+ REGION * as.factor(TEMPERATURE) + TISSUE_MASS_CENTERED + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(TEMPERATURE)
cs.model.1a.log.p2  %>% update(.~1+ REGION * as.factor(TEMPERATURE) + TISSUE_MASS_CENTERED + (1|FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
cs.emm <- cs.model.1a.log.p2 %>% emmeans(~REGION*TEMPERATURE)
eff_size(cs.emm, sigma = sigma(cs.model.1a.log.p2), edf=df.residual(cs.model.1a.log.p2))
##  contrast                                                              
##  Core TEMPERATURE34.6268656716418 - Leading TEMPERATURE34.6268656716418
##  effect.size    SE  df lower.CL upper.CL
##       -0.107 0.188 125    -0.48    0.265
## 
## sigma used for effect sizes: 0.4919 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion CS enzyme activity has a significantly positively correlated with temperature, however, there is no significant difference in the relationship between temperature and CS activity when comparing fish from low- and high-latitudes.

Lactate dehydrogenase: citrate synthase

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Two different data frames are being imported. The first has all the enzyme wave length absorption data for each sample and the tissue.mass data file contained information pertaining to the tissue samples that was used for each sample. Later on these two data frames will be merged.

Load data

Data manipulation

Both LDH and CS dataframes have been previously cleaned and manipulated, therefore, the only remaining step is to join the data frames together and then make another column that has the LDH:CS ratio

#--- data preparation/manipulation ---# 
ldh.cs.data <- ldh.data %>% 
  inner_join(select(cs.data, c("UNIQUE_SAMPLE_ID","CS_ACTIVITY")), by = "UNIQUE_SAMPLE_ID") %>% 
  mutate(LCr = LDH_ACTIVITY/CS_ACTIVITY)

Exploratory data analysis

LDH v TEMPERATURE [LATITUDE]
ggplot(ldh.cs.data, aes(x =as.numeric(temperature), y= LCr, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

LDH V TEMPERATURE [DENSITY]
ggplot(ldh.cs.data, aes(x = LCr)) + 
  geom_density(alpha =0.5, position = "identity") 

LDH v TISSUE MASS (LATITUDE)
ggplot(ldh.cs.data, aes(x =TISSUE_MASS_CENTERED, y= LCr, color = REGION)) + 
  geom_point() + geom_smooth(method = "lm", se=FALSE)

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagonistics provided by the performance and dHARMA packages in R.

Fixed factors (linear regression models)
model 1
#--- base model ---#
ldh.cs.model.1 <- glm(LCr~ 1 + REGION*temperature + TISSUE_MASS_CENTERED, 
                       family=gaussian(), 
                       data = ldh.cs.data)  

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.5983207 5.0802772 3.0703680 0.0026153
REGIONLeading -3.9718124 7.4737412 -0.5314356 0.5960452
temperature 0.4263166 0.1403483 3.0375611 0.0028954
TISSUE_MASS_CENTERED -0.5451098 0.2308028 -2.3617987 0.0197067
REGIONLeading:temperature -0.0072626 0.2047482 -0.0354710 0.9717599
model 2
ldh.cs.model.2 <- glm(LCr ~ 1 + REGION*temperature, 
                       family=gaussian(), 
                       data = ldh.cs.data) 

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.9399468 5.1625412 2.8939133 0.0044739
REGIONLeading -2.9630790 7.5937909 -0.3901976 0.6970390
temperature 0.4346143 0.1427914 3.0437008 0.0028368
REGIONLeading:temperature -0.0116615 0.2083689 -0.0559655 0.9554565
model comparison table
model df AICc BIC r2
ldh.cs.model.1 6 1058.427 1075.052 0.1609548
ldh.cs.model.2 5 1061.906 1075.843 0.1259477

The model that contains TISSUE_MASS_CENTERED seems to do better than the model that leaves TISSUE_MASS_CENTERED out. Therefore we will move ahead with the model that contains TISSUE_MASS_CENTERED as a co-variate.

Polynomials
polynomial models

Note that the linear model has already been created via model ldh.cs.model.1 in the previous section.

#--- second order polynomial ---# 
ldh.cs.model.1.p2 <- glm(LCr ~ 1 + REGION*poly(temperature, 2) + TISSUE_MASS_CENTERED, 
                      family=gaussian(), 
                      data = ldh.cs.data)  

#--- third order polynomial ---# 
ldh.cs.model.1.p3 <- glm(LCr ~ 1 + REGION*poly(temperature, 3) + TISSUE_MASS_CENTERED, 
                      family=gaussian(), 
                      data = ldh.cs.data)  

polynomial model comparisons

model df AICc BIC r2
ldh.cs.model.1 6 1058.427 1075.052 0.1651868
ldh.cs.model.1.p2 8 1062.050 1083.942 0.1707095
ldh.cs.model.1.p3 10 1064.173 1091.183 0.1864203

From our model comparison we can see that the model that runs temperature as a linear model performs the best. Therefore, moving forward we will use the linear model.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
ldh.cs.model.1a <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|fish_id), 
                       family=gaussian(), 
                       data = ldh.cs.data, 
                       REML = TRUE) 

ldh.cs.model.1b <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|POPULATION/fish_id), 
                       family=gaussian(), 
                       data = ldh.cs.data,
                       REML = TRUE) 

ldh.cs.model.1c <- glmmTMB(LCr ~ 1 + REGION*temperature + TISSUE_MASS_CENTERED + (1|fish_id) + (1 + REGION|POPULATION), 
                       family=gaussian(), 
                       data = ldh.cs.data,
                       REML = TRUE) # convergnece problem

random factor model comparisons

model df AICc BIC r2m r2c
ldh.cs.model.1a 7 994.5411 1013.818 0.1483192 0.1483192
ldh.cs.model.1b 8 996.8086 1018.700 0.1483183 0.1483183
ldh.cs.model.1c 10 NA NA 0.1483192 0.1483192

Model ldh.cs.model.1a appears to be the best model, however, there seems to be little difference in how the models change depending on how the random factors are arranged.

Model validation

performance
ldh.cs.model.1a (linear)

DHARMa residuals
ldh.cs.model.1a (linear)
ldh.cs.model.1a %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.072 0.052 0.076 0.152 0.944 0.512 0.9 0.864 0.708 0.692 0.436 0.66 0.764 0.924 0.984 0.272 0.136 0.388 1 0.868 ...
ldh.cs.model.1a %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.098, p-value = 0.1584
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.91401, p-value = 0.624
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 132, p-value = 0.2834
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001840215 0.053659931
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01515152
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.098, p-value = 0.1584
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.91401, p-value = 0.624
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 132, p-value = 0.2834
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001840215 0.053659931
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01515152

The ldh.cs.model.1a model looks good, and there seem to be no major violations of assumptions.

Partial plots

ggemmeans

plot_model

Model investigation

summary
Estimate StdError Zvalue Pvalue
(Intercept) 14.7100633 3.9311867 3.7418888 0.0001826
REGIONLeading -3.3913653 5.8092603 -0.5837861 0.5593642
temperature 0.4426002 0.0835813 5.2954427 0.0000001
TISSUE_MASS_CENTERED -0.4572146 0.3990826 -1.1456642 0.2519341
REGIONLeading:temperature -0.0132004 0.1215385 -0.1086113 0.9135108
Anova
Chisq Df Pr(>Chisq)
REGION 0.9281186 1 0.3353523
temperature 51.7028532 1 0.0000000
TISSUE_MASS_CENTERED 1.3125463 1 0.2519341
REGION:temperature 0.0117964 1 0.9135108
confint
2.5 % 97.5 % Estimate
(Intercept) 7.0050790 22.4150476 14.7100633
REGIONLeading -14.7773062 7.9945757 -3.3913653
temperature 0.2787838 0.6064166 0.4426002
TISSUE_MASS_CENTERED -1.2394020 0.3249729 -0.4572146
REGIONLeading:temperature -0.2514115 0.2250106 -0.0132004
Std.Dev.(Intercept)|fish_id 8.3080065 14.5084898 10.9789174
r-squared
R2_conditional R2_marginal optional
0.7240186 0.1483192 FALSE

Pairwise comparisons

emtrends [latitudes]
ldh.cs.model.1a  %>% emtrends(var = "temperature", type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean TISSUE_MASS_CENTERED values when looking at differences between latitudinal slopes.

emmeans [latitudes]
ldh.cs.model.1a  %>% emmeans(pairwise ~ temperature*REGION, type = "response") %>% pairs(by = "temperature") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)
TEMPERATURE
ldh.cs.model.1a  %>% emmeans(~ temperature*REGION, type = "response")  %>% summary(infer=TRUE)
Means - f(TEMPERATURE)
ldh.cs.model.1a  %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>% 
  emmeans(~REGION*temperature, type = "response") %>% summary(infer=TRUE)
Abs. diff - f(TEMPERATURE)
ldh.cs.model.1a  %>% update(.~1+ REGION * as.factor(temperature) + TISSUE_MASS_CENTERED + (1|fish_id)) %>% 
  emmeans(~REGION*temperature, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)
Effect size
ldh.cs.emm <- ldh.cs.model.1a %>% emmeans(~REGION*temperature)
eff_size(ldh.cs.emm, sigma = sigma(ldh.cs.model.1a), edf=df.residual(ldh.cs.model.1a))
##  contrast                                                              
##  Core temperature34.6969696969697 - Leading temperature34.6969696969697
##  effect.size    SE  df lower.CL upper.CL
##        0.506 0.527 130   -0.535     1.55
## 
## sigma used for effect sizes: 7.602 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion the LDH:CS ratio has a significantly positively correlated with temperature, however, there is no significant difference in the relationship between temperature and LDH:CS when comparing fish from low- and high-latitudes.

Immunocompetence

Scenario

For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations.

Immunocompetence was tested via phytohaemaglutinin (PHA) swelling assays at the same four experimental temperatures metabolic performance was tested at. To perform the assyu fish were injected with 0.03 mL of PHA subcutaneously in the caudal peduncle. Thickness of injection site was measured pre-injection as well as 18-24hour post-injection. PHA produces a localized, cell-mediated response (e.g., inflammation, T-cell proliferation, etc). The change in thickness between measurement periods was used as an proxy for immunocompetence.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)

Load data

pha <- import.data

Data manipulation

Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns.

pha2 <- pha %>% 
  dplyr::rename(PHA_28.5 = PHA_285) %>%
  mutate(FISH_ID = factor(FISH_ID), 
         POPULATION = factor(POPULATION), 
         REGION = factor(REGION), 
         TANK = factor(TANK), 
         PHA_28.5 = as.numeric(PHA_28.5), 
         MASS_CENTERED = scale(MASS, center = TRUE, scale = FALSE)) %>% 
  pivot_longer(cols = c(PHA_27, 
                        PHA_28.5, 
                        PHA_30, 
                        PHA_31.5), 
               names_to = 'PHA', 
               values_to = 'IMMUNE_RESPONSE') %>% 
  separate(col = PHA, 
           into = c('TEST','TEMPERATURE'), sep = '_') %>% 
  filter(IMMUNE_RESPONSE >= 0.01) %>% # removing negative values greater than -0.05
  mutate(TEMPERATURE = as.numeric(TEMPERATURE))

Great! That is everything for data manipulation

Exploratory data analysis

PHA V TEMP

ggplot(pha2, aes(x=TEMPERATURE, y=IMMUNE_RESPONSE)) + 
  geom_violin(alpha = 0.5) +  # four potential outliers but will keep for now 
  geom_point() 

PHA v TEMP (LATITUDE)

ggplot(pha2, aes(x=TEMPERATURE, y=IMMUNE_RESPONSE, fill = REGION, color = REGION)) + 
  geom_violin(alpha = 0.5) + 
  geom_point(position = position_jitterdodge(dodge.width = 0.9, jitter.width = 0), color = "black")

PHA v MASS (LATITUDE)

ggplot(pha2, aes(x=MASS_CENTERED, y=IMMUNE_RESPONSE, fill = REGION, color = REGION)) +
  geom_point() + geom_smooth(method = "lm")

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.

Fixed factors (linear regression models)

model 1
#--- base model ---#
pha.1 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * TEMPERATURE + MASS_CENTERED, 
                     family=gaussian(), 
                     data = pha2) 
summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.7185981 0.4222908 4.0697030 0.0000750
REGIONLeading -0.6055925 0.6198923 -0.9769318 0.3301350
TEMPERATURE -0.0504956 0.0146466 -3.4476083 0.0007294
MASS_CENTERED 3.0066564 2.2817687 1.3176868 0.1895654
REGIONLeading:TEMPERATURE 0.0207950 0.0213983 0.9718052 0.3326714
model 2
#--- experimental rmr equipment hypothesis ---#
pha.2 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * TEMPERATURE, 
                     family=gaussian(), 
                     data = pha2)  
summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.6908547 0.4227662 3.9995034 0.0000980
REGIONLeading -0.6042622 0.6213620 -0.9724801 0.3323269
TEMPERATURE -0.0489398 0.0146335 -3.3443600 0.0010344
REGIONLeading:TEMPERATURE 0.0195688 0.0214288 0.9132038 0.3625536
model comparison table
model df AICc BIC r2
pha.1 6 -22.05659 -4.195801 0.1032647
pha.2 5 -22.43443 -7.482064 0.0939358

There is little difference between the two initial models, therefore, we will move forward with the model that has less terms.

It looks like the third model is better than the previous two. Next we will test to see if the variable temperature performs best as a 1^st (linear), 2^nd (quadratic), or 3^rd (cubic) order polynomial. As the relationship between temperature and resting oxygen consumption is predicted to be non-linear.

Polynomials

polynomial models

Note that the linear model has already been created via model pha.2 in the previous section.

pha.2.p2 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 2), 
                 family=gaussian(),
                 data = pha2)  

pha.2.p3 <- glm(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3), 
                 family=gaussian(),
                 data = pha2)
polynomial model comparisons
model df AICc BIC r2
pha.2 5 -22.43443 -7.482064 0.0955802
pha.2.p2 7 -33.95206 -13.211452 0.1814782
pha.2.p3 9 -36.46535 -10.053268 0.2166317

From our model comparison we can see that the model improves when TEMPERATURE is modeled as 2\(^nd\) or 3\(^rd\) order polynomial. The model that implements a 3\(^rd\) order polynomial performs the best, and therefore, we will be moving forward with this model.

Random factors

Fish were repeatedly sampled over four different temperatures, therefore repeated sampling needs to be accounted for. To do this random factors will be included within the model. There are a number of options that can be used for random factors including 1) accounting for repeated sampling of individuals, 2) accounting for repeated sampling of individuals nested within population, 3) account for repeated sampling of individuals and populations without nesting. All three models will be run a compaired.

random factor models
pha.2.p3a <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|FISH_ID), 
                  family=gaussian(),
                  data = pha2,
                  REML = TRUE) 


pha.2.p3b <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID), 
                  family=gaussian(),
                  data = pha2,
                  REML = TRUE)

pha.2.p3c <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|FISH_ID) + (1|POPULATION), 
                  family=gaussian(),
                  data = pha2,
                  REML = TRUE)
random factor model comparisons
model df AICc BIC r2m r2c
pha.2.p3a 10 -19.04376 10.15879 0.2090404 0.2090404
pha.2.p3b 11 -18.95044 13.01158 0.2022862 0.2517016
pha.2.p3c 11 -18.95044 13.01158 0.2022862 0.2517016

There is little difference between the models, however, the nest model does seem to a bit better than the none nested model that only includes (1|FISH_ID) for this variable. There no difference between the second and third model, either could be used. Moving forward the second model with the nested random effects will be used.

Model validation

performance

pha.2.p3b

DHARMa residuals

pha.2.p3b
pha.2.p3b %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.336 0.116 0.828 0.4 0.232 0.192 0.504 0.036 0.284 0.42 0.42 0.52 0.476 0.648 0.536 0.192 0.172 0.52 0.44 0.74 ...
pha.2.p3b %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10325, p-value = 0.06743
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94119, p-value = 0.656
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01257862
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.10325, p-value = 0.06743
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.94119, p-value = 0.656
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01257862

The pha.2.p3b model performs well, however, in the model validation performed by the performance package our modeled predictive lines aren’t matching up with our observed data as well as we might hope. There are also some issues with the residuals within our DHARMa validations. Let’s see if we can fix this by including some different link functions within out model.

Model re-validation

performance

Gaussian (identity)

Gaussian (log)

Gaussian (inverse)

DHARMa

Gaussian (identity)
pha.2.p3b %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.34 0.104 0.84 0.396 0.228 0.192 0.496 0.032 0.284 0.428 0.408 0.536 0.464 0.652 0.544 0.188 0.172 0.536 0.44 0.748 ...
pha.2.p3b %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.096377, p-value = 0.1043
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0064, p-value = 0.912
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01257862
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.096377, p-value = 0.1043
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0064, p-value = 0.912
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 2, observations = 159, p-value = 0.3618
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.001526975 0.044697834
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01257862
Gaussian (log)
pha.2.p3b.log %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.34 0.096 0.856 0.384 0.28 0.172 0.564 0.024 0.324 0.448 0.376 0.568 0.524 0.692 0.632 0.188 0.18 0.556 0.532 0.82 ...
pha.2.p3b.log %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.070591, p-value = 0.4065
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0116, p-value = 0.88
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 159, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001592188 0.0345421401
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.006289308
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.070591, p-value = 0.4065
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 1.0116, p-value = 0.88
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 1, observations = 159, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.0001592188 0.0345421401
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                            0.006289308
Gaussian (inverse)
pha.2.p3b.inv %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.352 0.128 0.9 0.372 0.236 0.212 0.596 0.032 0.312 0.424 0.42 0.596 0.54 0.688 0.636 0.208 0.196 0.548 0.484 0.84 ...
pha.2.p3b.inv %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.080302, p-value = 0.2568
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.099362, p-value = 0.872
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.080302, p-value = 0.2568
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.099362, p-value = 0.872
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

Adding log or inverse link functions to the model does not help. In fact, it seems to make the model worse! From here we can try to experiment with different distributions. The first distribution that comes to mind is the Gamma distribution, as it can be helpful when dealing with skewed data when the data set contains no zeros and all positive values.

Fit model - alternative distributions

pha.2.p3b <- glmmTMB(IMMUNE_RESPONSE ~ 1 + REGION * poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID), 
                  family=gaussian(),
                  data = pha2,
                  REML = FALSE) 

pha.2.p3b.gamma <- glmmTMB(IMMUNE_RESPONSE~ 1 + REGION* poly(TEMPERATURE, 3) + (1|POPULATION/FISH_ID), 
                       family=Gamma(link="log"), # default option
                       data = pha2, 
                       REML = FALSE)
model df AICc BIC r2m r2c
pha.2.p3b 11 -32.7787 -0.8166702 0.2153506 0.2153506
pha.2.p3b.gamma 11 -136.0719 -104.1099081 0.2635317 0.2635317

From this model comparison we can see that the model fitted with the Gamma distribution performs much better than the model fitted with the gaussian distribution. Let’s look at the model validation plots for out Gamma model.

Model re-re-validation

performance

Gamma distribution

Looks better

DHARMa

Gaussian (identity)
pha.2.p3b.gamma %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.288 0.176 0.888 0.212 0.16 0.3 0.664 0.008 0.388 0.504 0.6 0.676 0.6 0.888 0.656 0.368 0.012 0.736 0.552 0.86 ...
pha.2.p3b.gamma %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.059044, p-value = 0.6364
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.7999, p-value = 0.4
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.059044, p-value = 0.6364
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.7999, p-value = 0.4
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 159, p-value = 0.6421
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.02293344
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

Looks much better!

The Gamma does a decent job of modelling our data and we can move forward with it and start to investigate the model. #### {-}

Partial plots

ggemmeans

plot_model

Model investigation

summary

Estimate StdError Zvalue Pvalue
(Intercept) -1.3847661 0.0907111 -15.2656703 0.0000000
REGIONLeading -0.1636031 0.1348218 -1.2134770 0.2249475
poly(TEMPERATURE, 3)1 -4.8549609 1.1206614 -4.3322281 0.0000148
poly(TEMPERATURE, 3)2 -2.6825503 1.1140906 -2.4078385 0.0160473
poly(TEMPERATURE, 3)3 1.1279198 1.1107719 1.0154378 0.3098972
REGIONLeading:poly(TEMPERATURE, 3)1 1.3686326 1.6384364 0.8353285 0.4035328
REGIONLeading:poly(TEMPERATURE, 3)2 -2.4701223 1.6562860 -1.4913622 0.1358664
REGIONLeading:poly(TEMPERATURE, 3)3 0.3600876 1.6536806 0.2177492 0.8276245

Anova

Chisq Df Pr(>Chisq)
REGION 1.421053 1 0.2332302
poly(TEMPERATURE, 3) 50.414204 3 0.0000000
REGION:poly(TEMPERATURE, 3) 2.933305 3 0.4020230

confint

2.5 % 97.5 % Estimate
(Intercept) -1.5625566 -1.2069755 -1.3847661
REGIONLeading -0.4278489 0.1006427 -0.1636031
poly(TEMPERATURE, 3)1 -7.0514170 -2.6585049 -4.8549609
poly(TEMPERATURE, 3)2 -4.8661279 -0.4989728 -2.6825503
poly(TEMPERATURE, 3)3 -1.0491531 3.3049927 1.1279198
REGIONLeading:poly(TEMPERATURE, 3)1 -1.8426438 4.5799090 1.3686326
REGIONLeading:poly(TEMPERATURE, 3)2 -5.7163832 0.7761385 -2.4701223
REGIONLeading:poly(TEMPERATURE, 3)3 -2.8810668 3.6012420 0.3600876
Std.Dev.(Intercept)|FISH_ID:POPULATION 0.0000000 Inf 0.0000462
Std.Dev.(Intercept)|POPULATION 0.0000024 691.9122734 0.0410140

r-squared

R2_conditional R2_marginal optional
0.265392 0.2635317 FALSE

Note that the random effects within this model are explaining very little variance, and are largely non-informative.

Pairwise comparisons

emtrends [latitudes]

pha.2.p3b.gamma %>% emtrends(var = "TEMPERATURE", type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

SCROLL TO THE RIGHT –>

The numbers in the left most column in the table just mention that the slopes are assuming mean MASS_CENTERED and RESTING_TIME_SEONDS values when looking at differences between latitudinal slopes.

emmeans [latitudes]

pha.2.p3b.gamma %>% emmeans(pairwise ~ TEMPERATURE*REGION, type = "response") %>% pairs(by = "TEMPERATURE") %>% summary(by = NULL, adjust = "tukey", infer=TRUE)

temperature

pha.2.p3b.gamma %>% emmeans(~ TEMPERATURE*REGION, type = "response")  %>% summary(infer=TRUE)

Means - f(temperature)

pha.2.p3b.gamma %>% update(.~ 1 + REGION* as.factor(TEMPERATURE) + (1|POPULATION/FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% summary(infer=TRUE)

Abs. diff - f(temperature)

pha.2.p3b.gamma %>% update(.~ 1 + REGION* as.factor(TEMPERATURE) + (1|POPULATION/FISH_ID)) %>% 
  emmeans(~REGION*TEMPERATURE, type = "response") %>% pairs(by ="REGION") %>% summary(infer=TRUE)

effect size [latitudes]

immuno.emm <- pha.2.p3b.gamma %>% emmeans(~REGION*TEMPERATURE)
eff_size(immuno.emm, sigma = sigma(pha.2.p3b.gamma), edf=df.residual(pha.2.p3b.gamma))
##  contrast                                                              
##  Core TEMPERATURE28.9339622641509 - Leading TEMPERATURE28.9339622641509
##  effect.size    SE  df asymp.LCL asymp.UCL
##       -0.105 0.265 Inf    -0.626     0.415
## 
## sigma used for effect sizes: 0.815 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion while immunocompetence is significantly positively correlated with temperature, there is no significant difference in immunocompetence between fish from the low- and high-latitude regions.

Hematocrit

Scenario

For initial details on the experiment performed please read the ReadMe file. In breif, Acanthochromis polyacanthus from two different regions on the Great Barrier Reef (GBR) were tested for metabolic performance at four different temperatures, 27\(^\circ\)C, 28.5\(^\circ\)C, 30\(^\circ\)C, and 31.5\(^\circ\)C. Fish used in this study were collected from two different regions, low- (i.e. Cairns) and high-latitude (i.e., Mackay), within each region fish were collected from a total of three different populations.

Blood samples for hematocrit sampling were collected 2-weeks after fish underwent respiormetry testing at the final experimental temperature (31.5\(^\circ\)C). Hematocrit ratios were measured by comparing the amount of packed red blood cells to blood plasma, after blood samples collected via capillary tubes.

Read in the data

Before beginning always make sure that you are working in the correct directory

knitr::opts_knit$set(root.dir=working.dir)

Now we can import that data. Replace import data with the PATH to your data file. I have secretly labelled my PATH import.data (i.e. import.data = “PATH TO MY FILE”)

Load data

hema <- import.data

Data manipulation

Before the data can be analysed it is important to clean up the data file. Below a number of adjustments are made, primarily making sure that columns are being treated appropriately as either factors, numeric, or as time, as well as the renaming of some columns.

hema <-  hema %>% 
  mutate(PERC_RBC = as.numeric(PERC_RBC), 
         MASS = as.numeric(MASS),
         MASS_CENTERED = scale(MASS, center = TRUE, scale = FALSE)) %>% 
  drop_na(PERC_RBC)

Great! That is everything for data manipulation

Exploratory data analysis

HEMATOCRIT V LATITUDE

ggplot(hema, aes(y=PERC_RBC, x=REGION)) + 
  geom_boxplot() + 
  theme_classic() 

HEMATOCRIT V LATITUDE (distr)

hema %>% ggplot(aes(x=PERC_RBC)) + 
  geom_density() +
  facet_wrap(~REGION)

Fit the model

The model was fit using the glm and later glmmTMB package in R. A number of different models were tested to determine which hypothesis and associated variables best predicted resting oxygen consumption. Model fit was examined using AICc, BIC, and r-squared values. Additional model were examined via the validation diagnostics provided by the performance and dHARMA packages in R.

Fixed factors (linear regression models)

model 1
#--- base model ---#
hema.1 <- glm(PERC_RBC ~ REGION, 
                family = gaussian(),  
                data = hema) 
summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2236353 0.0121433 18.41628 0.0000000
REGIONLeading 0.0355744 0.0181554 1.95944 0.0578398
model 2
#--- experimental rmr equipment hypothesis ---#
hema.2 <- glm(PERC_RBC ~ REGION + MASS_CENTERED, 
                 family = gaussian(), 
                 data = hema)  
summary
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2251300 0.0137982 16.3159093 0.0000000
REGIONLeading 0.0322334 0.0230904 1.3959651 0.1715176
MASS_CENTERED -0.0003688 0.0015402 -0.2394485 0.8121545
model comparison table
model df AICc BIC r2
hema.1 3 -107.0515 -102.84462 0.0940123
hema.2 4 -104.6075 -99.26924 0.0930529

There is little difference between the two initial models, however, the model that does not include MASS_CENTERED prefers better via the model comparisons scores, therefore, we will move forward with the first and most simple model.

Random factors

Fish were only sampled once, therefore, there is no need to include individual as a random factor. However, we will test how the inclusion of POPULATION influences the model.

random factor models
hema.1 <- glmmTMB(PERC_RBC ~ REGION, 
                   family = gaussian(), 
                   REML = TRUE, 
                   data = hema) 

hema.1b <- glmmTMB(PERC_RBC ~ REGION + (1|POPULATION), 
                    family = gaussian(), 
                    REML = TRUE, 
                    data = hema) 

hema.1c <- glmmTMB(PERC_RBC ~ REGION + (REGION|POPULATION), 
                    family = gaussian(), 
                    REML = TRUE, 
                    data = hema) # convergence problem
## Warning in fitTMB(TMBStruc): Model convergence problem; singular convergence
## (7). See vignette('troubleshooting')
random factor model comparisons
model df AICc BIC r2m r2c
hema.1 3 -93.24011 -89.03324 0.0940123 0.0940123
hema.1b 4 -90.73388 -85.39565 0.0940123 0.0940123
hema.1c 6 -85.58393 -78.46809 0.0958933 0.1526837

The inclusion of random effects does help explain additional variation and therefore will not be included in the model. Note the final model will be run using glm and not glmmmTMB because we are not using a mixed model.

Model validation

performance

hema.1

DHARMa residuals

hema.1
hema.1 %>% simulateResiduals(plot=TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.892 0.956 0.616 0.744 0.06 0.548 0.1 0.06 0.108 0.148 0.188 0.06 0.628 0.392 0.396 0.06 0.912 0.968 0.84 0.276 ...
hema.1 %>% DHARMa::testResiduals(plot=TRUE)

## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.084842, p-value = 0.9473
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97134, p-value = 0.952
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 38, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.09251276
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0
## $uniformity
## 
##  Asymptotic one-sample Kolmogorov-Smirnov test
## 
## data:  simulationOutput$scaledResiduals
## D = 0.084842, p-value = 0.9473
## alternative hypothesis: two-sided
## 
## 
## $dispersion
## 
##  DHARMa nonparametric dispersion test via sd of residuals fitted vs.
##  simulated
## 
## data:  simulationOutput
## dispersion = 0.97134, p-value = 0.952
## alternative hypothesis: two.sided
## 
## 
## $outliers
## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  simulationOutput
## outliers at both margin(s) = 0, observations = 38, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.00000000 0.09251276
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                                      0

The basic looks good and passes the validation checks.

Partial plots

ggemmeans

plot_model

Model investigation

summary

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2236353 0.0121433 18.41628 0.0000000
REGIONLeading 0.0355744 0.0181554 1.95944 0.0578398

Anova

LR Chisq Df Pr(>Chisq)
REGION 3.839405 1 0.0500613

confint

## Waiting for profiling to be done...
2.5 % 97.5 %
(Intercept) 0.1998348 0.2474358
REGIONLeading -0.0000095 0.0711583

r-squared

R2
R2 0.0963721

Pairwise comparisons

emmeans [latitudes]

hema.1 %>% emmeans(pairwise ~ REGION, type = "response")
## $emmeans
##  REGION  emmean     SE df lower.CL upper.CL
##  Core     0.224 0.0121 36    0.199    0.248
##  Leading  0.259 0.0135 36    0.232    0.287
## 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast       estimate     SE df t.ratio p.value
##  Core - Leading  -0.0356 0.0182 36  -1.959  0.0578

effect size [latitudes]

hema.emm <- emmeans(hema.1, "REGION")
eff_size(hema.emm, sigma = sigma(hema.1), edf=df.residual(hema.1))
##  contrast       effect.size    SE df lower.CL upper.CL
##  Core - Leading      -0.639 0.335 36    -1.32   0.0398
## 
## sigma used for effect sizes: 0.05565 
## Confidence level used: 0.95

Summary figure

Conclusion

  • In conclusion there is no significant difference in hematocrit ratios between A. polyacanthus from low- and high-latitude at 31.5\(^\circ\)C.

General summary

MAXIMUM METABOLIC RATE and ABSOLUTE AEROBIC SCOPE showed significant differences within the temperature response curve between fish collected from the low latitude region of the Great Barrier Reef and the sampled high-latitude (Mackay region) of the Great Barrier Reef. However no significant difference was seen in RESTING METABOLIC RATE. All oxygen consumption metrics showed significant positive relationships with temperature.

Similarly, the two enzymes that were analysed in this study showed significant positive relationships with temperature, however no significant differences between low- and high-latitude regions. Immunocompetence displayed a hill shape relationship with temperature, represented via a third order polynomial model. Immunocompetence was also significantly related to temperature but, once again no significant differences were observed between low- and high-latitude regions.

No significant differences were observed between low- and high-latitude regions in respect to hematocrit ratios, however, the difference between regions was marginally significant, pvalue =0.057, and deserves frther investigation in future research.

Further analysis of results can be see within the paper titled “[INSERT TITLE HERE]”, doi: XXXX

Figures

Figure for manuscript

Figure 2

rmr.g2 <- ggplot(rmr.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION))+
  geom_jitter(data=rmr.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
  stat_smooth(method = "lm", 
              formula =y ~ poly(x, 2, raw=TRUE)) + 
  geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA)+ 
  scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+ 
  scale_y_continuous(limits = c(2,12), breaks = seq(2, 12, by = 2)) +
  theme_classic() + ylab(expression("RESTING METABOLIC RATE (MgO "[2]* " hr"^{-1} * ")")) + xlab("")+
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) + 
  theme(legend.position = "top", 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10)) + 
  annotate("text", x=30, y= 11.5, label="P =0.51", fontface="italic", size=5)

mmr.g2 <- ggplot(mmr.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION))+
  geom_jitter(data=mmr.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) + 
  stat_smooth(method = "lm", 
              formula =y ~ poly(x, 2, raw=TRUE)) + 
  geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA) +
  scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+ 
  scale_y_continuous(limits = c(6,28), breaks = seq(6, 28, by = 2)) +
  theme_classic() + ylab(expression("MAXIMUM OXYGEN CONSUMPTION (MgO   " [2]* "  hr"^{-1} * ")"))  + xlab("")+
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) + 
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) + 
  theme(legend.position = 'none', 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10))+
  annotate("text", x=30, y= 27, label="P =0.0010", fontface="italic", size=5) 

nas.g2 <- ggplot(nas.emm.df, aes(y=emmean, x=TEMPERATURE, color=REGION, linetype=REGION)) + 
  geom_jitter(data=nas.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
  stat_smooth(method = "lm", 
              formula =y ~ poly(x, 2, raw=TRUE)) + 
  geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA)+
  scale_y_continuous(limits = c(4,20), breaks = seq(4, 20, by = 2)) + 
  scale_x_continuous(limits = c(26.9, 31.6), breaks = seq(27, 31.5, by = 1.5))+
  theme_classic() + ylab(expression("ABSOLUTE AEROBIC SCOPE (MgO "[2]* " hr"^{-1} * ")")) +
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) + 
  theme(legend.position = "none", 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10)) + 
  annotate("text", x=30, y= 19, label="P =0.0010", fontface="italic", size=5)
fig2 <- ggarrange(rmr.g2, mmr.g2, nas.g2, 
          nrow = 3, 
          ncol=1, 
          align = "v",
          labels = c("A","B","C"),
          common.legend = TRUE); fig2
## Warning: Removed 4 rows containing missing values (`geom_point()`).
Fig. 2: Thermal performance curves of resting oxygen performance (A), maximum oxygen performance (B), and absolute aerobic scope (C) of fish from low- (solid red lines) and high-latitudinal (dashed blue line) regions across four different temperatures. Ribbon represent 95% confidence intervals.
Fig. 2: Thermal performance curves of resting oxygen performance (A), maximum oxygen performance (B), and absolute aerobic scope (C) of fish from low- (solid red lines) and high-latitudinal (dashed blue line) regions across four different temperatures. Ribbon represent 95% confidence intervals.
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/figures/Figure2.pdf", fig2, device="pdf", width=6.6, height = 19.86, units = "in", dpi=1200)

Figure 3

pha.emm <- emmeans(pha.2.p3b.gamma, ~ TEMPERATURE*REGION, 
               at = list(TEMPERATURE = seq(from=27, to = 31.5, by=.1)), 
               type='response')
pha.emm.df=as.data.frame(pha.emm)

pha.obs <-  pha2 %>% 
  mutate(Pred=predict(pha.2.p3b.gamma, re.form=NA),
         Resid = residuals(pha.2.p3b.gamma, type='response'),
         Fit = Pred + Resid)

pha.g2 <- ggplot(pha.emm.df, aes(y=response, x=TEMPERATURE, color = REGION, linetype=REGION)) + 
  stat_smooth(method = "lm", se=FALSE,
              formula =y ~ poly(x, 3, raw=TRUE)) +  
  geom_ribbon(aes(x=TEMPERATURE, ymin= asymp.LCL, ymax= asymp.UCL, fill = REGION), 
              alpha = 0.2, color=NA) + 
  #scale_y_continuous(limits = c(0,0.9), breaks = seq(0, 0.9, by =0.15)) + 
  theme_classic() + ylab("PHA RESPONSE (mm)") + 
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) + 
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  theme(legend.position = c(0.855,0.8), 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10)) + 
  annotate("text", x=30, y=0.495, fontface="italic", size=5, label="P =0.85")
pha.g2
Fig. 3: Thermal performance curve of swelling response of the caudal peduncle ~18-24 hours post injection of phytohemagglutinin across four different experimental temperatures. Solid red lines represent low-latitude populations. Dashed blue line represents high-latitude populations. Ribbon represents 95% confidence intervals.
Fig. 3: Thermal performance curve of swelling response of the caudal peduncle ~18-24 hours post injection of phytohemagglutinin across four different experimental temperatures. Solid red lines represent low-latitude populations. Dashed blue line represents high-latitude populations. Ribbon represents 95% confidence intervals.
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/figures/Figure3.pdf", pha.g2, device="pdf", width=6.6, height = 5, units = "in", dpi=1200)

Figure 4

#---ldh---#
ldh.emm <- emmeans(ldh.model.1.p3a, ~ temperature*REGION, 
                   at = list(temperature = seq(from=20, to = 50, by=1)))
ldh.emm.df=as.data.frame(ldh.emm)

ldh.obs <- ldh.data %>% 
  mutate(Pred = predict(ldh.model.1.p3a, re.form=NA), 
         Resid = residuals(ldh.model.1.p3a, type = 'response'), 
         Fit = Pred - Resid)

cldh2 <- ggplot(ldh.emm.df, aes(y=emmean, x=temperature, color=REGION, fill=REGION, linetype=REGION)) + 
  stat_smooth(method = "lm", se=FALSE, 
              formula =y ~ poly(x, 3, raw=TRUE)) + 
  geom_ribbon(aes(x=temperature, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA) +
  geom_jitter(data=ldh.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
  scale_y_continuous(limits = c(0,250), breaks = seq(0, 250, by =50)) + 
  theme_classic() + ylab(expression("LDH ACTIVITY (U mg "^{-1}*" tissue)")) + xlab("TEMPERATURE") +
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) + 
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  theme(legend.position = c(0.80,0.2), 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10)) + 
  annotate("text", x=40, y=240, label="p =0.98", fontface = 'italic', size = 5)

#--- cs ---#
cs.emm <- emmeans(cs.model.1a.log.p2, ~ TEMPERATURE*REGION, type='response',
                   at = list(TEMPERATURE = seq(from=20, to = 50, by=1)), 
                  )
cs.emm.df=as.data.frame(cs.emm)

cs.obs <- CS.data %>% 
  mutate(Pred = predict(cs.model.1a.log.p2, re.form=NA, type= 'response'), 
         Resid = residuals(cs.model.1a.log.p2, type = 'response'), 
         Fit = Pred - Resid)

cs.plot2 <- ggplot(cs.emm.df, aes(y=response, x=TEMPERATURE, color=REGION, fill=REGION, linetype=REGION)) + 
  stat_smooth(method = "lm", se=FALSE, 
              formula =y ~ poly(x, 2, raw=TRUE)) +  
  geom_jitter(data=cs.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
  geom_ribbon(aes(x=TEMPERATURE, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA) +
  scale_y_continuous(limits = c(0,10), breaks = seq(0, 10, by =2)) + 
  theme_classic() + ylab(expression("CS ACTIVITY (U mg "^{-1}*" tissue)")) + xlab("TEMPERATURE") +
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude"))+
  theme(legend.position = 'none', 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10))+
  annotate("text", x=40, y=9.8, label="p =0.15", fontface = 'italic', size = 5)

#--- ldh:cs ratio ---#
ldh.cs.emm <- emmeans(ldh.cs.model.1a, ~ temperature*REGION, type='response',
                   at = list(temperature = seq(from=20, to = 50, by=1)), 
                  )
ldh.cs.emm.df=as.data.frame(ldh.cs.emm)

ldh.cs.obs <- ldh.cs.data %>% 
  mutate(Pred = predict(ldh.cs.model.1a, re.form=NA, type= 'response'), 
         Resid = residuals(ldh.cs.model.1a, type = 'response'), 
         Fit = Pred - Resid)

ldh.cs.plot2 <- ggplot(ldh.cs.emm.df, aes(y=emmean, x=temperature, color=REGION, fill=REGION, linetype=REGION)) + 
  stat_smooth(method = "lm", se=FALSE, 
              formula =y ~ poly(x, 2, raw=TRUE)) +  
  geom_jitter(data=ldh.cs.obs, aes(y=Fit, color=REGION), width=0.05, alpha = 0.3) +
  geom_ribbon(aes(x=temperature, ymin= lower.CL, ymax= upper.CL, fill = REGION), 
              alpha = 0.2, color=NA) +
  scale_y_continuous(limits = c(0,50), breaks = seq(0, 50, by =10)) + 
  theme_classic() + ylab("LDH:CS RATIO") + xlab("TEMPERATURE") +
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude"))+
  theme(legend.position = 'none', 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10))+
  annotate("text", x=40, y=48, label="p =0.91", fontface = 'italic', size = 5)
fig4 <- ggarrange(cldh2, cs.plot2, ldh.cs.plot2, 
          nrow = 3, 
          ncol=1, 
          align = "v",
          labels = c("A","B","C"),
          common.legend = TRUE)

fig4
Fig.4: Thermal performance curve of maximal activity of A) lactate dehydrogenase (LDH), B) citrate synthase (CS), and C) LDH:CS ratio of low- (solid red line) and high-latitudinal (dashed blue line) populations across four experimental temperatures (i.e., 20°C, 30°C, 40°C, 50°C). Ribbons represent 95% confidence intervals.
Fig.4: Thermal performance curve of maximal activity of A) lactate dehydrogenase (LDH), B) citrate synthase (CS), and C) LDH:CS ratio of low- (solid red line) and high-latitudinal (dashed blue line) populations across four experimental temperatures (i.e., 20°C, 30°C, 40°C, 50°C). Ribbons represent 95% confidence intervals.
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/figures/Figure4.pdf", fig4, device="pdf", width=6.6, height = 19.86, units = "in", dpi=1200)

Supplemental figures for manuscript

Supplemental figure 4

hema.newdata <-  hema.1 %>% ggemmeans(~REGION) %>% 
  as.data.frame() %>% 
  dplyr::rename(REGION = x)

obs <- hema %>% 
  mutate(Pred = predict(hema.1, re.form=NA), 
         Resid = residuals(hema.1, type = "response"), 
         Fit = Pred + Resid)

hema.plot <- ggplot(hema.newdata, aes(y=predicted, x=REGION, color=REGION, linetype=REGION))  + 
  geom_jitter(data=obs, aes(y=Pred, x=REGION, color =REGION), 
              width = 0.05, alpha=0.3)+
  geom_pointrange(aes(ymin=conf.low, 
                      ymax=conf.high), 
                  shape = 19, 
                  size = 1, 
                  position = position_dodge(0.2)) + 
  scale_linetype_manual(values = c("solid", "dashed"), labels = c("Low-latitude","High-latitude")) +
  scale_color_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  ylab("HEMATOCRIT RATIO") +
  scale_x_discrete(name = "", 
                   labels = c("Low-latitude","High-latitude"))+
  theme_classic() + 
  theme(legend.position = 'top', 
        legend.text = element_text(size = 10), 
        legend.title = element_blank(), 
        axis.title = element_text(size =12), 
        axis.text = element_text(size=10))  + 
  annotate("text", x=1.5, y=0.275, fontface="italic", size=5, label="P =0.057")
hema.plot
Fig. 3: Comparison of hematocrit ratios, that were measured at 31.5°C, between low- (red) and high-latitudinal (blue) populations. No significant difference was observed between the different latitudes (p =0.058). Solid (low-latitude) and dashed (high-latitude) lines represent 95% confidence intervals.
Fig. 3: Comparison of hematocrit ratios, that were measured at 31.5°C, between low- (red) and high-latitudinal (blue) populations. No significant difference was observed between the different latitudes (p =0.058). Solid (low-latitude) and dashed (high-latitude) lines represent 95% confidence intervals.
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/supplemental_figures/Supplemental_figure4.pdf", hema.plot, device="pdf", width=6.6, height = 5, units = "in", dpi=1200)

Supplemental figure 3

library(ggridges)
mass.distr <- resp4 %>% distinct(FISH_ID, .keep_all = TRUE) %>% 
  mutate(CHAMBER_RATIO = 1.5/DRY_WEIGHT) %>%
  ggplot(aes(x=CHAMBER_RATIO, y=REGION, fill=REGION)) + 
  scale_fill_manual(values=c("#B2182B", "#4393C3"), labels = c("Low-latitude","High-latitude")) +
  ylab("")+ scale_x_continuous(limits = c(20,150), breaks = seq(20, 150, by =20)) + 
  geom_density_ridges(scale = 2, jittered_points=TRUE, position = position_points_jitter(height = 0),
                      point_shape = '|', point_size = 3, point_alpha = 1, alpha = 0.7) + 
  theme_classic() + 
  theme(axis.title.y=element_blank(),
        axis.text.y=element_blank(), 
        axis.ticks.y=element_blank())
mass.distr
## Picking joint bandwidth of 5.99
Fig. 3: Density plots displayed fish body size to chamber ratios. Fish that were sampled for aerobic physiology from the low-latitude region are represented in red; fish from the high-latitude region are represent in blue.
Fig. 3: Density plots displayed fish body size to chamber ratios. Fish that were sampled for aerobic physiology from the low-latitude region are represented in red; fish from the high-latitude region are represent in blue.
ggsave("C:/Users/jc527762/OneDrive - James Cook University/PhD dissertation/Data/Chapter1_LocalAdaptation/supplemental_figures/Supplemental_figure3.pdf",mass.distr, device="pdf", width=6.6, height = 5, units = "in", dpi=1200)
## Picking joint bandwidth of 5.99